Message Boards Message Boards

GROUPS:

[WSS18] Generating Music with Expressive Timing and Dynamics

Posted 1 year ago
3330 Views
|
16 Replies
|
20 Total Likes
|
Attachments:
16 Replies

enter image description here - Congratulations! This post is now a Staff Pick as distinguished by a badge on your profile! Thank you, keep it coming!

Posted 11 months ago

Very nice. You generated jazzz :) The Only possible outcome.

Posted 11 months ago

Thank you!

Yeah...jazz was a little part of the dataset but it gave a big influence :)

Nice project. You must realize that MIDI is absolutely awful at describing 'real' music. For a keyboard player, it might be sufficient to specify pitch and duration, using equal temperament, but for a wind player or especially a string player or singer, most of the stylistic information is missing. This is why a MIDI rendering of a solo violin piece (Bach's Chaconne, for example) or a flute piece is flat and mechanical.

I looked into inventing a replacement for MIDI, where you could specify the attack -- which is highly variable for most instruments -- and the way the sound evolves over time. However, it became apparent that even for 'simple' music, there were simply too many variables. It was easier to just learn the instrument. ;-). I am just an amateur flute player, but I can vary the note attack in 10 or 15 different ways, and change the way the sound developed, including adjusting the timbre during the note's elaboration is several more. A professional can do a lot more.

Professional musicians can play in strict time, of course. However, for many types of music, such as Chopin, the controlled deviation from strict time is an important performance criterion.

Anonymous User
Anonymous User
Posted 11 months ago

quantized chopin

fake flute in fake space

Edited to add: The mod deleted the part where I posted video of Mathematica crashing when I search the docs for "Audio"; deleted there part where I disclosed I don't use Mathematica for audio work; but left up the samples of my audio work. I used Emacs, C, and Python, not Mathematica.

POSTED BY: Anonymous User
Answer
Posted 11 months ago

This a nice generated music, Joe. I didn't get a chance to answer on your initial comment but I remember something about an algorithmic way of generation. Are these tracks were composed algorithmically?

Anonymous User
Anonymous User
Posted 11 months ago

I'd mentioned that algorithmic composition is my personal focus (has been for decades); but only Chopin knows how he composed the 1st sample, and the 2nd example is just notes I non-algorithmically chose to demonstrate the sound of the flute.

POSTED BY: Anonymous User
Answer
Posted 11 months ago

I agree that MIDI representation loses an enormous part of musical information. This is the price you pay when you want to generalize a music notation. To be honest, I don't know much about a classic music notation or MIDI notation but as I understand MIDI tries to "digitize" a classic one. Correct me if I'm wrong but a classic music notation also lacks those things that you've mentioned.

If we want to learn specific techniques of playing an instrument we must learn it from raw music data(sound). Here is the latest research in this directions: https://twitter.com/deepmindai/status/1012290879120429056

Anonymous User
Anonymous User
Posted 11 months ago

"... I don't know much about a classic music notation or MIDI notation but as I understand MIDI tries to 'digitize' a classic one. Correct me if I'm wrong but a classic music notation also lacks those things that you've mentioned."

MIDI is a finite representation, so it can be stored and recalled by Turing machines; classical notation has the same characteristic, as it can be reduced to a finite number of symbols arranged in a computable layout.

"... If we want to learn specific techniques of playing an instrument we must learn it from raw music data(sound). Here is the latest research in this directions: https://twitter.com/deepmindai/status/1012290879120429056"

What you call "raw music data (sound)" isn't the real thing either, though. It's WAV or MP3 format, which again is just a finite, i.e., digital, representation.

It was suggested elsewhere in this thread that a finite set of differential equations would suffice to completely describe the real thing; but that assertion is unfounded.

POSTED BY: Anonymous User
Answer
Posted 11 months ago

You are right, audio representation is just another digital version of music but it's much more detailed and closer to reality than midi. With this kind of data, you can learn a timbre of an instrument and sound of it in special conditions(techniques of playing).

As you could listen from the tweet I've posted above - the piano generated their model sound pretty realistically. The quality of sound is poor for now though.

I have read about and heard examples where recordings of piano music -- specifically Glenn Gould's recording of Bach's Goldberg variations -- were analyzed and encoded so that one of those automatic pianos (replacing fingers by actuators with an regular grand piano). The results sounded pretty good, and there were concerts (live performances) of the result.

However, a piano is a very simple instrument to model, since the only parameters are the time and velocity of key strike (plus pedal info, of course.) It's pretty useless for modern extended techniques, such as Cage's pieces for prepared piano or some of George Crumb's stuff. (I studied composition with George Crumb a while ago.)

Doing a flute, oboe, or violin is much harder. Complicating matters from a technical viewpoint is that performance details change depending on the acoustics of the concert hall. This says nothing of the interpretive variations that you get when two or more people play together. When you added extended techniques, such as multi-phonics for the flute (more rarely the oboe or clarinet), The only real solution would be to make a model based on the differential equations of sound generation and use the raw data to discover the particular solutions.

I don't see much point to this, other than to put a lot of musicians out of work, since a convincing performance would require a lot of work and expense.

I have dabbled in electronic music, and I think that its real strength is in finding new ways to generate organized sound (as Edgard Varèse called music) and not to try to imitate human produced music. As much as I like some of the music (which I have been listening to since the 1950s), I still prefer live performance.

Anonymous User
Anonymous User
Posted 11 months ago

This is midi triggering sampled individual notes and parts of notes: https://soundcloud.com/philippe-baylac.

"The only real solution would be to make a model based on the differential equations"

There's a difference between those equations and the human listening-experience. Our ears filter out all energy over 20 kHz off the bat, and their dynamic range is limited too. When instruments are combined, some sounds perceptually mask other sounds. If listeners can't tell the difference in a double-blind test, it's good enough. If you like math, it's math trying to figure out which corners you can get away with cutting in your models.

POSTED BY: Anonymous User
Answer

sorry to get off topic. My point is that using MIDI as a way to characterize music is a bad model, and is likely to lead to wrong conclusions.

Not to say that algorithmic analysis is hopeless. Someone wrote a program that generated chorales in the style of Bach, and they are pretty good. I have use read that someone made an algorithm for generating pop hits, and considering how generic a lot f pop sounds, it is probably being used.

MIDI is not even useful for cataloging music by 'incipits' (the first few notes of a theme), since it does not differentiate between sharp and flat. Whenever I import a MIDI file into Finale ( music notation program), I have a couple hours of clean-up just to make the music look presentable -- and that is when I have the score available.

There is an opportunity to replace MIDI with something a bit less primitive, since there are no longer the old constraints on memory and storage. Making it easy to change temperaments is a start, as is a way to save the notations for sharp/flat or double-sharp/double-flat. These distinctions are not relevant to the piano, but are important for other instruments. Having a notation for microtones would be useful for modern music. Specifying the attack profile and sound envelope would make the standard useful enough to roughly render most music performances other than voice.

It would be able to, for example, make use of Harry Partch's intonation. It would also be able to properly render non-western music, which does not use equal temperament at all.

I am afraid that any effort that relies on MIDI will not reflect any of the main features of music, so I offer this as a suggestion.

Using actual performance as a source of data is problematic since the size of the dataset is pretty limited. Just using Western 'art-music' as an example, there are probably 10 or 15 recordings of each of the Beethoven symphonies, and a lot fewer of most other classical compositions. If you wanted to use the equivalent of ImageIdentify[] to guess the composer from the music, you might be able to tell the difference between Beethoven and Bartok, but not between Bach and Telemann, or Mahler and Richard Strauss. (Depending on the length of the passage, I sometimes still confuse the latter two.)

I did some work in this area back before MIDI, when you could fit all the composers who also knew how to program into a small room. It is a non-trivial problem. I am convinced that the starting point to gaining any real insight is to replace MIDI. Wolfram Language is certainly up to the task.

Attachment

Anonymous User
Anonymous User
Posted 11 months ago

I am curious as to the details of the algorithm you characterize here:

"Because music data is a sequence of events we need an architecture that knows how to remember, and predicts what is the next event based on all previous. This is exactly what Recurrent Neural Networks try to do - RNNs can use their internal state (memory) to process sequences of inputs. If you want to check more details I would recommend to watch this introduction video....."

Your post itself is clear to me, but after half an hour in the blogs and videos you link to, I still felt disoriented, so I didn't continue. Along the way, I got confused why are you using a "training set, consisting of 1.5 GB of text from old novels and news articles" (Wolfram English Character-Level Language Model V1)? How does knowledge about English novels and news articles apply to instrumental composition?

"On the abstract level, RNN learns the probabilities of events that follow after each other"

How well does it predict the stock market? Does it correctly predict the outcome of experiments turning on the unification of relativity and quantum mechanics? Why haven't you used it to win the lottery?

The video says at 17:09 that there are "no good rules" about how big the parameter to LongShortTermMemoryLayer should be. How musically useful (in the opinion of listeners) is it on parameters small enough to be computationally practical?

POSTED BY: Anonymous User
Answer
Posted 11 months ago

Your post itself is clear to me, but after half an hour in the blogs and videos you link to, I still felt disoriented, so I didn't continue. Along the way, I got confused why are you using a "training set, consisting of 1.5 GB of text from old novels and news articles" (Wolfram English Character-Level Language Model V1)? How does knowledge about English novels and news articles apply to instrumental composition?

Sorry, that's my bad. I didn't make it clear that I meant only architecture of Wolfram English Character-Level Language Model V1. So, I didn't use the model itself just borrowed the architecture for sequence modeling.

How well does it predict the stock market? Does it correctly predict the outcome of experiments turning on the unification of relativity and quantum mechanics? Why haven't you used it to win the lottery?

It makes predictions based on specified data. In case of a language model, you have a text and the model can extract out of it how often 'e' follows after 'h' or that 'zxc' combination never happened. How would you do something similar in case of the lottery? It's random based sequence of numbers. If you want to apply ML to stock markets, here is new handy course: https://www.udacity.com/course/ai-for-trading--nd880

The video says at 17:09 that there are "no good rules" about how big the parameter to LongShortTermMemoryLayer should be. How musically useful (in the opinion of listeners) is it on parameters small enough to be computationally practical?

I suppose in the case of MIDI events bigger parameter means a more generalized understanding of the data provided. But to be honest I can't answer this question because I didn't make enough of tests to check the difference.

Anonymous User
Anonymous User
Posted 11 months ago

How would you do something similar in case of the lottery? It's random based sequence of numbers.

The lottery is often modeled as a random sequence, but the actual lottery is a physical phenomenon unfolding in compliance with the laws of physics; in that sense, the lottery is no more random than the infinite sequence -1, +1, -1, +1... as generated by a digital computer built of analog parts acting in compliance with the same laws of physics. So I don't totally understand what makes the lottery off-limits to this neural technology.

I am curious to challenge the neural-net technology. I would try to come up with algorithms that generate music that confounds the neural-net tech. If the neural-net can figure out the algorithm, the neural-net wins; otherwise the algorithm wins. But it is not readily apparent to me how to run these experiments in practice; i.e., I don't know what characters to type on my computer to get the neural-net to do its thing. All I remember from the video is the narrator saying neural-nets can do everything, but I didn't get the details on how to test that claim for truthfulness.

POSTED BY: Anonymous User
Answer
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract