Message Boards Message Boards

Training a recurrent neural network (RNN) to generate piano music

Posted 3 years ago
POSTED BY: Alec Graves
7 Replies
Posted 3 years ago

POSTED BY: Alec Graves

Thank you Alec for posting this wonderful example of the application of RNNs to Midi music files. It is something I have wanted to do myself. Another topic I think would be valuable might be to classify various midi files in terms of various underlying musical patterns and then use those patterns to create new music by modifying the patterns using techniques like inversion, transposition etc. Do you know of any work in that area using MMA?

regards Michael

POSTED BY: Michael Kelly
Posted 3 years ago

I am glad you enjoyed this project, Michael. It is encouraging that you were thinking of a similar project: great minds think alike, as they say!

Your suggesting of classifying patterns and transposing / inverting / etc. them brings to mind the analogy of using words-level encoding vs character-level encoding for tokenizers in language models. Word-level encodings typically result in higher performance in language processing tasks, so I would expect it to perform better on musical data as well.

I have thought some about creating word-level encodings for music (e.g. an extension of BPE/SentencePieces/WordPieces), but I was unable to come up with some robust method of encoding the metadata for musical words (note timings and volumes for notes in the word). A successful approach would likely involve timing quantization and categorizing notes into timing patterns (e.g. triplet, quarter-note, 16th-note) as is done in sheet music. Even combining chords into the same token is tricky because it would likely need to have the data of whether-or-not to play the chord as an arpeggio.

Your comments about inversion, transposition, etc. also bring to mind another idea I was thinking of: encoding pitch values as {note-name, octave} pairs. The current implementation encodes note pitches as one-hot vectors of 120 possible pitches, but this encoding likely makes learning simple octave-shifts or inversions quite tricky for the model. By instead encoding the pitches as a one-hot length 12 vector for note-name (A, A#, B, ...) and a one-hot length 10 vector for octave, octave shifts and chord inversions should become much simpler/faster for the net to learn, even if we are still doing our 'character-level' note-by-note encoding of music data.

Honestly, I have not done any research into what others have done in the music space. I would assume some clever methods were used to encode notes into higher-level patterns when earlier music generation work was done using Markov models.

Thanks for the thought-provoking suggestions, and sorry for the lengthy reply!

POSTED BY: Alec Graves

I was surprised how pleasant these music tracks actually are. Thank you for sharing! How much time did this project take? Also I am curious about data -- did you publish your dataset, maybe on that GitHub repo? Is this all modern-classic composers based or there is some old classical music too for training? Any famous composers?

I built my dataset from a collection of my favorite piano scores from musescore. You want a lot of different .mid files so the network you train cannot simply memorize all of your training data and instead has to perform some amount of information generalization. Our data augmentation strategy helps with this, but it will not completely solve the issue of having too little data. I have found that 150 songs is acceptable when using smaller RNNs.

BTW about repos. You might consider for your work Wolfram's Neural Net and Data Repositores:

POSTED BY: Vitaliy Kaurov
Posted 3 years ago

Thank you for your interest in this project, Vitaliy!

This project took me one week in total: around two days to set up MIDI data loading, conversion to numeric data and training, and five days of experimenting with different architectures on my 2060rtx GPU. I let each architecture train for around 12 hours, so most of this time investment was passive.

Unfortunately, I cannot publish my dataset since it contains derivatives of copyrighted works. However, the works used can be downloaded from musescore as MIDIs if one buys a "pro" subscription. I would encourage you to hand-choose music that speaks to you if you want to go this route.

Most of my dataset consists of my personal favorite modern piano compositions or arrangements of popular-ish music for piano. I could publish a full list with links if anyone is interested. Some notable works used in my dataset include:

  • 24 Preludes from Chopin's Opus 28
  • River Flows In You by Yiruma
  • Flower Dance by DJ Okawari

There are many larger MIDI music datasets available, and I think some of these would be a nice addition to the Wolfram Data Repository. My original idea was to pre-train a network on such datasets then fine tune on my hand-selected favorites, but I have not invested the time yet. Maybe I can do this in the future.

Training seems to take such a long time with RNNs (even on my comparatively tiny dataset) that I would really like to experiment with transformer architectures before investing too many more gpu-days into training! I would love to see a GPT2-Music net in the Wolfram Neural Net repo some day.

POSTED BY: Alec Graves
Posted 3 years ago

Also, many works on MuseScore.com have been published into the public domain, including many great classical compsers' works. These public domain works can be downloaded for free without a 'pro' license since there are no royalties to pay, and they could form the basis of a dataset that could be freely distributed.

Such a dataset could be uploaded to the Wolfram Data Repository without issue!

POSTED BY: Alec Graves

enter image description here -- you have earned Featured Contributor Badge enter image description here Your exceptional post has been selected for our editorial column Staff Picks http://wolfr.am/StaffPicks and Your Profile is now distinguished by a Featured Contributor Badge and is displayed on the Featured Contributor Board. Thank you!

POSTED BY: EDITORIAL BOARD
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract