Thank you for your interest in this project, Vitaliy!
This project took me one week in total: around two days to set up MIDI data loading, conversion to numeric data and training, and five days of experimenting with different architectures on my 2060rtx GPU. I let each architecture train for around 12 hours, so most of this time investment was passive.
Unfortunately, I cannot publish my dataset since it contains derivatives of copyrighted works. However, the works used can be downloaded from musescore as MIDIs if one buys a "pro" subscription. I would encourage you to hand-choose music that speaks to you if you want to go this route.
Most of my dataset consists of my personal favorite modern piano compositions or arrangements of popular-ish music for piano. I could publish a full list with links if anyone is interested. Some notable works used in my dataset include:
- 24 Preludes from Chopin's Opus 28
- River Flows In You by Yiruma
- Flower Dance by DJ Okawari
There are many larger MIDI music datasets available, and I think some of these would be a nice addition to the Wolfram Data Repository. My original idea was to pre-train a network on such datasets then fine tune on my hand-selected favorites, but I have not invested the time yet. Maybe I can do this in the future.
Training seems to take such a long time with RNNs (even on my comparatively tiny dataset) that I would really like to experiment with transformer architectures before investing too many more gpu-days into training! I would love to see a GPT2-Music net in the Wolfram Neural Net repo some day.