Group Abstract

Message Boards

WOLFRAM COMMUNITY

9.9K Views

0 Replies

5 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

[WSC17] Finding Kiwi Calls In Audio Recordings

Isaac Chandler

Posted 9 years ago

Recognizing Kiwi Bird Calls in Audio Recordings My project for WSC 2017 was identifying kiwi calls in audio recordings. The project can be broken down into 2 main steps: Finding clips that contain noise that could be a kiwi Identifying the clips that actually contain a kiwi call Finding Clips Data The data is an audio recording taken overnight in Northland, New Zealand Filtering I started off by filtering the audio to be between 1200Hz and 3600Hz in order to remove the majority of the noise. I then normalized the audio to make the volume consistent. AudioNormalize[HighpassFilter[LowpassFilter[audio, Quantity[1200, "Hertz"]], Quantity[3600, "Hertz"]]] At first I just took any intervals of audio that were above a certain threshold. AudioIntervals[audioProcessed, #RMSAmplitude > 0.02 &] This yielded many very short clips so I then extended each clip 1 second in each direction and merged clips that were within 2 seconds of each other. ({#[[1]] - 1, #[[2]] + 1} & /@ AudioIntervals[audioProcessed, #RMSAmplitude > 0.02 &]) //. {pre___, {c_, d_}, {x_, y_}, post___} /; d > x - 2 -> {pre, {c, y}, post} This still had the problem of containing many clips just over 2 seconds long so I removed any clips less than 10 seconds asas kiwi calls are longer than ten seconds. Cases[int, {x_, y_} /; y - x > 10] I then limited the length of each of the clips to exactly 10 seconds to make it easier for the machine learning. AudioTrim[#,10]&/@clips Identifying calls Unsupervised Learning Since my data was unclassified I initially tried to use unsupervised learning for clustering the audio clips, although this didn't yield any particularly meaningful results. FeatureSpacePlot[clips, LabelingFunction -> (#2[[2]] &), PerformanceGoal -> "Quality"] The middle section is about 50% kiwi calls and 50% not kiwi calls, as is the circle around the outside so I have no idea what the feature extractor is looking at. Classifying Data Since unsupervised learning didn't work particularly well I decided to manually classify every single clip. Doesn't that sound fun... clipId = 1; clipClasses = Range@Length[clips2]; Dynamic@Row[{clipId, "/", Length[clips2], If[clipId <= Length[clips2], clips2[[clipId]], "DONE!"] k, Button["Yes", clipClasses[[clipId]] = "Kiwi"; clipId = clipId + 1], Button["No", clipClasses[[clipId]] = "NotKiwi"; clipId = clipId + 1]}] Dynamic[Row[{clipId - 1, clipClasses[[clipId - 1]]}]] Neural Network - Take One I used 200 of the clips as training data for the neural network. data=Thread[clips,clipClasses]; training=RandomSample[data,200]; Counts[training] test=Complement[data,training]; Counts[test] I used a neural net to classify the audio clips as it was the best at classifying them. cf=Classify[training,Method->"NeuralNetwork",PerformanceGoal->"Quality"] cm=ClassifierMeasurements[cf,test] The accuracy of the neural network was extremely poor though cm["Accuracy"] 0.559565 Neural Network - Take Two I then tried downsampling the audio from 44.1kHz to 10kHz to reduce the amount of extraneous data the neural network has to work with. clipsSmall = AudioResample[#, Quantity[10, "Kilohertz"]] & /@ clips; This reduced the amount of data without significantly changing the audio cm["Accuracy"] 0.453762 Well back to the drawing board I guess Neural Network - Take Three This time I tried a different approach, since the neural network seemed to handle audio extremely poorly I instead input the spectrogram of the audio into the neural net. data= Thread[Image[Abs[SpectrogramArray[#]]] & /@ clipsSmall -> clipClasses[[All, 1]]] The image processing side has had a lot more work done so this should work much better. cm["Accuracy"] 0.820756 At this point I ran out of ways to improve the score and ran up against the limit of the accuracy that I was able to classify the clips so I'm going to call that a success. Finding calls Finally finding the calls, which is the simple now that we can find and classify potential calls FindCalls[clip_] := (Module[{int = {}, audioProcessed, clips, classes}, audioProcessed = ProcessAudio[clip]; int = ProcessAudioIntervals[audioProcessed]; int = Cases[int, {x_?NumberQ, y_?NumberQ} /; y - x >= 10]; clips = Which[Length[int] == 0, {}, Length[int] == 1, AudioTrim[#, 10] & /@ {AudioTrim[audioProcessed, int]}, Length[int] > 1, AudioTrim[#, 10] & /@ AudioTrim[audioProcessed, int]]; classes = KiwiCallClassifier[ Image[Abs[ SpectrogramArray[ AudioResample[#, Quantity[10, "Kilohertz"]]]]] & /@ clips]; Thread[{Extract[clips, Position[classes, "Kiwi"]], Quantity[#, "Seconds"] & /@ Extract[int, Position[classes, "Kiwi"]][[All, 1]]}]]) FindCalls[Import@"C:\\Users\\Isaac\\Desktop\\Programming\\data\\Kiwi Audio\\Processing\\20170604k-53.mp3"] // Grid Reflections Good Actually finding clips with noise was quite a simple and easy task, I just messed around with the frequencies on the filters and the threshold for a while. Bad In hindsight I realize that this wasn't a problem that was particularly suited to unsupervised learning, as there are other more general features than "kiwi" or "not kiwi" for a feature extractor to identify (although I still have no idea what it was doing). Worst Another thing that's important for undertaking a machine learning project like this is having a lot of data. If I do another project like this I'm definitely going to use more, already classified data instead of spending 2 hours listening to birds screaming and loud backgound noise. This is the training data I used for the neural net https://drive.google.com/file/d/0B4VdlZ57AG6BcXBMMi1NQnQyRms/view?usp=sharing

POSTED BY: Isaac Chandler

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

[WSC17] Finding Kiwi Calls In Audio Recordings

Recognizing Kiwi Bird Calls in Audio Recordings

Finding Clips

Data

Filtering

Identifying calls

Unsupervised Learning

Classifying Data

Neural Network - Take One

Neural Network - Take Two

Neural Network - Take Three

Finding calls

Reflections

Good

Bad

Worst