sorry to get off topic. My point is that using MIDI as a way to characterize music is a bad model, and is likely to lead to wrong conclusions.
Not to say that algorithmic analysis is hopeless. Someone wrote a program that generated chorales in the style of Bach, and they are pretty good. I have use read that someone made an algorithm for generating pop hits, and considering how generic a lot f pop sounds, it is probably being used.
MIDI is not even useful for cataloging music by 'incipits' (the first few notes of a theme), since it does not differentiate between sharp and flat. Whenever I import a MIDI file into Finale ( music notation program), I have a couple hours of clean-up just to make the music look presentable -- and that is when I have the score available.
There is an opportunity to replace MIDI with something a bit less primitive, since there are no longer the old constraints on memory and storage. Making it easy to change temperaments is a start, as is a way to save the notations for sharp/flat or double-sharp/double-flat. These distinctions are not relevant to the piano, but are important for other instruments. Having a notation for microtones would be useful for modern music. Specifying the attack profile and sound envelope would make the standard useful enough to roughly render most music performances other than voice.
It would be able to, for example, make use of Harry Partch's intonation. It would also be able to properly render non-western music, which does not use equal temperament at all.
I am afraid that any effort that relies on MIDI will not reflect any of the main features of music, so I offer this as a suggestion.
Using actual performance as a source of data is problematic since the size of the dataset is pretty limited. Just using Western 'art-music' as an example, there are probably 10 or 15 recordings of each of the Beethoven symphonies, and a lot fewer of most other classical compositions. If you wanted to use the equivalent of ImageIdentify[] to guess the composer from the music, you might be able to tell the difference between Beethoven and Bartok, but not between Bach and Telemann, or Mahler and Richard Strauss. (Depending on the length of the passage, I sometimes still confuse the latter two.)
I did some work in this area back before MIDI, when you could fit all the composers who also knew how to program into a small room. It is a non-trivial problem. I am convinced that the starting point to gaining any real insight is to replace MIDI. Wolfram Language is certainly up to the task.
Attachments: