Group Abstract

Message Boards

WOLFRAM COMMUNITY

19K Views

10 Replies

19 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

[WSS19] OCR for Sheet Music

Daniel Csillag

Posted 7 years ago

Overview Given an image of some sheet music, generate some internal representation of it that can then be played, reprinted or exported to some external musical notation software. In order to do this, first we split the sheet music into images of each staff, and remove the staff lines; we then find bounding boxes for each musical component using image segmentation, and classify them. This is all then passed to a parser that parses the final sheet music. Finding the Staffs Staff lines are long, horizontal and thin; as such, they can be highlighted by using a bottom hat transform with a short vertical kernel, and the extracted with `ImageLines`. Then, by looking for close groups of five such lines, we can extract an image for each staff in the original score. We also find the median distance between the staff lines, both for padding the staff images and for reusing later. detectStaffLineImages[image_] := Module[{ staffLines = Select[ImageLines[BottomHatTransform[image, BoxMatrix[{1, 0}]], Automatic, 0.0065][[All, 1]], (Abs@VectorAngle[Subtract @@ #, {-1, 0}] < 10 \[Degree]) &] }, \[CapitalDelta] = Median@Differences@Sort@Map[Mean, staffLines[[All, All, 2]]]; staffLineGroups = Gather[Sort[ staffLines, (Last@Mean@#1 > Last@Mean@#2) &], (Last@Mean@#1 - Last@Mean@#2 < 5 \[CapitalDelta]) &] // Echo; {Map[ ImageTrim[image, Flatten[#, 1], {0, 2.25 \[CapitalDelta]}] &, staffLineGroups ], \[CapitalDelta]} ] Removing the Staff Lines Staff lines are dark, long, thin and horizontal. In order to remove them, we use a dilation with a short vertical kernel and the bottom hat transform of the image with a long horizontal kernel; by considering their closing together with taking the pixel-wise minimum and maximum of the resulting image with its morphological binarization, we get an image of the staff with no staff lines and not many artefacts. removeStaffLines[image_, \[CapitalDelta]_] := With[ {withoutStaffLines = imageMin[Closing[image, BoxMatrix[{\[CapitalDelta]/6, 0}]], ColorNegate@ BottomHatTransform[image, BoxMatrix[{0, \[CapitalDelta]}]]]}, imageMax[withoutStaffLines, MorphologicalBinarize[withoutStaffLines, {0.66, 0.33}]] ] Finding Bounding Boxes for the Musical Components Image Segmentation For the image segmentation, we use a SegNet [^segnet] trained on the DeepScores [^deepscores] [Segmentation Dataset](https://drive.google.com/drive/folders/1KFxqi0rO-bJrd03rLk87fF1iOmnjpaoG), with some preprocessing: Each pair of images (original and segmented), was separated into individual staffs and the symbol colors were negated (so that there would be more contrast with the black background). Finding the Bounding Boxes In order to find the bounding boxes from the segmented image, we look for the its morphological components using `ComponentMeasurements`. Before using `ComponentMeasurements`, we blur and binarize the segmented image, as it usually yields more accurate results; we also discard any bounding boxes with area less than 25 pixels, as that is very likely just noise. Classifying the Musical Components For the component classification, we use a `ClassifierFunction` trained on the DeepScores [^deepscores] [Classification Dataset](https://drive.google.com/file/d/1bdBrX0dAX734I3MA_6-wH_-N2eqq_tf_/view), cropping the images in relation to a lookup table - this is so that the images contain only the actual symbol, and the classifier doesn't learn the context instead of the symbol. Unfortunately, the classifier trained during the Wolfram Summer Program misclassifies some images; however, it does still provide something that can be worked with. Parsing the Sheet Music Very limited parsing could be done during the Wolfram Summer Program: we sort the notes and pitch them, and add rests in as well. Finding the Pitches Pitching the notes depends on their distance to the bottom of the staff and a reference pitch, which is determined by the clef. Because of the way we trimmed the staff images, we know that the first staff line (from the bottom) has the Y coordinate $2.25 \Delta$, where $\Delta$ is the distance between the staff lines in the original image. We also know, from the way we trimmed the staffs from the original page, that the distance between the staff lines in the resized image is fixed - and is $\delta = 10$. `PitchNumber` calculates the distance from the note centroid to the bottom of the staff, normalizing by half of the distance between the staff lines. `PitchToLetter` finds the name for the given output of `PitchNumber` in relation to some reference pitch. PitchNumber[noteCentroid_, \[Delta]_, bottom_] := (noteCentroid[[2]] - bottom)/(0.5 \[Delta]) naturalNotes = Flatten[Table[# <> ToString[i] & /@ {"A", "B", "C", "D", "E", "F", "G"}, {i, 1, 7}], 1] PitchToLetter[pitchNumber_, referenceNumber_, referenceLetter_] := RotateLeft[naturalNotes, FirstPosition[naturalNotes, referenceLetter] - 1][[ Round[pitchNumber - referenceNumber] + 1]] Basic Parsing In this case, the parsing is a simple sorting - there are no dynamics, articulation or even accidentals taken into consideration. ParseSheetMusic[pitchedNotes_, rests_] := SortBy[Join[pitchedNotes, restsNotation], NotationCentroid[#][[1]] &] In the end, this is our final result: {SheetMusicNote[{66.5, 64.}, "B5"], SheetMusicRest[{97., 45.5}, "restHBar"], SheetMusicRest[{97.5, 64.5}, "restHalf"], SheetMusicNote[{121., 64.}, "B5"], SheetMusicNote[{159., 64.}, "B5"], SheetMusicNote[{196.5, 64.}, "B5"], SheetMusicRest[{287., 49.5}, "rest32nd"], SheetMusicNote[{327.5, 50.}, "F4"], SheetMusicNote[{366., 59.5}, "A5"], SheetMusicRest[{456., 75.}, "restQuarter"]}

POSTED BY: Daniel Csillag

10 Replies

Sort By:

Daniel Csillag

Posted 7 years ago

Wow, I thought the notebook had been uploaded here. In any case, it's available at https://github.com/dccsillag/wss-omr. Classification dataset link: works, just download and ignore Google Drive being unable to show *.zip files. Segmentation dataset link: available at https://repository.cloudlab.zhaw.ch/artifactory/deepscores/archives/2017/ (download the first archive) (DeepScores itself is now rooted at https://tuggeluk.github.io/downloads/). The SegNet implementation should work fine. As for inputting a PDF, you need to convert it to a raster image. Maybe use xpdf/mupdf? This still isn't a robust system. I do plan on working on it at a later date though.

POSTED BY: Daniel Csillag

Michael Kelly

Michael Kelly, Wolfram Research Inc.

Posted 7 years ago

I have not been able to do anything with these functions. For instance removeStaffLines has undefined functions imageMin and imageMax. But there is a reference to SegNet and its GitHub source file at http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html. The problem is that it is written for Python not Mathematica. This Post needs to be updated and explained in more detail.

POSTED BY: Michael Kelly

Larry Lange

Posted 7 years ago

Has anyone been able to create anything that would be helpful?

POSTED BY: Larry Lange

Michael Kelly

Michael Kelly, Wolfram Research Inc.

Posted 7 years ago

Daniel Thanks for writing up this material. Is it possible that you could also supply the notebook that you used for this post. It would help immeasurably in replicating your results. I am having difficulty finding and applying the DeepScores NN Thank you, Michael kelly

POSTED BY: Michael Kelly

Michael Kelly

Michael Kelly, Wolfram Research Inc.

Posted 7 years ago

As Larry identified above, neither of us can access the segnet page on Google at https://drive.google.com/drive/folders/1KFxqi0rO-bJrd03rLk87fF1iOmnjpaoG)

POSTED BY: Michael Kelly

Larry Lange

Posted 7 years ago

I have sheet music in a PDF format. I don't how to get into your program, Can someone provide an example?

POSTED BY: Larry Lange

Larry Lange

Posted 7 years ago

Could you place the method you used to put the image to begin the process and the music sheet you used?

POSTED BY: Larry Lange

Larry Lange

Posted 7 years ago

Classifying the Musical Components For the component classification, we use a ClassifierFunction trained on the DeepScores [^deepscores] [Classification Dataset](https://drive.google.com/file/d/1bdBrX0dAX734I3MA_6-wH_-N2eqq_tf_/view), cropping the images in relation to a lookup table - this is so that the images contain only the actual symbol, and the classifier doesn't learn the context instead of the symbol. Unfortunately, the classifier trained during the Wolfram Summer Program misclassifies some images; however, it does still provide something that can be worked with. I am having a hard time getting to the Classification Dataset](https://drive.google.com/file/d/1bdBrX0dAX734I3MA_6-wH_-N2eqq_tf_/view)

POSTED BY: Larry Lange

Eleazar Johannian

Posted 7 years ago

Thank you very much Daniel. Wonderful presentation!

POSTED BY: Eleazar Johannian

Larry Lange

Posted 7 years ago

Thank you very much. It i a big help to me. It shows me the great potential of the language. Since I am a complete novice in using the language and music theory. I was struggling with creating user functions.

POSTED BY: Larry Lange

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

[WSS19] OCR for Sheet Music

Overview

Finding the Staffs

Removing the Staff Lines

Finding Bounding Boxes for the Musical Components

Image Segmentation

Finding the Bounding Boxes

Classifying the Musical Components

Parsing the Sheet Music

Finding the Pitches

Basic Parsing