Introduction:
The goal of this project is finding the difference in the tone of different dialogues through the translating human dialogue into music. Music has different keys that the music is in. How about human dialogue translated into music? I would like to translate speech or dialogue of people into the language of music and observe the connection between the tone of the speech and the musical key of the speech. Im interested in applying such idea to an automatic translator or a speech transcription through voice. I have been into music since I was very young. I grew up as a violinist and started producing music. Now, I play the guitar and the piano. I always loved listening to music. Through Wolfram Language audio processor, I got to learned about the music inside our daily speeches. I wanted to go deeper in the subject and observe is there are any musical quality within our spoken language.
Background Information:
About musical quality, I decided to call that if played notes or chords are in a specific musical key, such as C major or D minor or a Mixolydian mode, the notes and chords have a musical quality. Therefore, the important part about this project is detecting the key of the played sound.
Algorithms and Code:
Converting music into quantified data
The first operation that has to be done is collecting the data of pitch the speech is giving. Convert them into a simple chart where the received speech turns into a pitch data, using PitchRecognize function.
davepitchdata=PitchRecognize[davecomedy,"QuantizedMIDI"];
Organizing the data
1) Duplicate Pitch Elimination
During this process, all the duplicate pitches were deleted.
davepitchDataClean=DeleteCases[davepitchdata["Values"],Missing[]];
2) Putting the data into a MIDI range
After the duplicate deletion, the pitch data was first shifted into the MIDI note data range, which is from -64 through 64. This was done through using Rescale function.
daveroundednote=Round[Rescale[#,MinMax[davepitchDataClean],{-60,48}]&/@davepitchDataClean];
3) Putting the data into a musical scale
Before any further in the code to put the data in a scale, the musical scale had to constructed. This was done through, first, creating all the modes in string. Then, depending on the mode's, or the scale's, range, calling 0 the Middle C, the numbers or notes were assigned to each mode. After, the rescaling method used in putting the pitch recognition data into the MIDI range to have different octaves of modes as well.
modes = {"Ionian", "Dorian", "Phrygian", "Lydian", "MixoLydian",
"Aeolian", "Locrian"};
modeSystemStructure = {"Ionian" -> {0, 2, 4, 5, 7, 9, 11},
"Dorian" -> {0, 2, 3, 5, 7, 9, 10},
"Phrygian" -> {0, 1, 3, 5, 7, 8, 10},
"Lydian" -> {0, 2, 4, 6, 7, 9, 11},
"MixoLydian" -> {0, 2, 4, 5, 7, 9, 10},
"Aeolian" -> {0, 2, 3, 5, 7, 8, 10},
"Locrian" -> {0, 1, 3, 5, 6, 8, 10}};
modeSystemC = (# ->
Table[(# /. modeSystemStructure) + 12 i, {i, 0, 9}] &) /@ modes;
RotateMode[mode_] :=
MapThread[
Rule, {RotateLeft[modes,
mode /. MapThread[Rule, {modes, Range[0, 6]}]], (mode /.
modeSystemC)[[1]]}]
BuildScaleSystem[modeSystem_, mode_String] := Function[u,
Module[{systemMode = u},
u -> ((u /. modeSystem) + (u /. RotateMode[mode]))
]
] /@ (RotateMode[mode][[All, 1]])
BuildScaleSystem[modeSystemC, "MixoLydian"]
LocrianScale =
Flatten[Union[Table[{0, 2, 4, 5, 7, 9, 10} - 12 i, {i, 1, 6}],
Take["Locrian" /. BuildScaleSystem[modeSystemC, "Locrian"], 1]]];
davescalednote = Nearest[LocrianScale, #] & /@ daveroundednote;
4) Choosing the instrument and arrangement
The instrument was employed by using an if statement within the Sound function. I arranged the MIDI data for two separate instruments: Woodblock and polysynthesizer. The woodblock was used for lower octave notes and the polysynthesizer was used for upper octaves.
daveoutput=Sound[SoundNote[#,0.009,If[Mean[#]<= -20,"Harp","Organ"]]&/@davescalednote]
5) Hear the Final Product!
https://soundcloud.com/jamie-lim-14/dave-chappelle-standup-comedy-converted-to-music
Problems / Rooms for Improvements
There were some minor troubles during the process, such as putting the quantified information of human speech in the MIDI number range and fitting the quantified information into a specific scale. The MIDI note numbers ranged from -64 through 64, with 128 keys, while the quantified numbers are in the range of 100 through about 180.
Main Results:
The bigger dynamic range there is in the dialogues volume, the louder and the bigger dynamic range there was in the musical
The translation into music can be set into different modes of music (Locrian, Aeolian, Dorian, Ionian, Mixolydian, Lydian, Phrygian).
Future Work
Exploring alternate ways to translate dialogue into music, such as incorporating velocity as a part of the file quantity.
Putting all translations into he same musical key and tempo
https://github.com/limjaeyoon/DialogueMusicalAnalysis.git
Attachments: