Have you considered using truncated versions of any of the networks contained in the Wolfram Net Repository as feature extractors?
I imagine a truncated version of the Deep Speech 2 network would make sense if your interested in what was said
https://resources.wolframcloud.com/NeuralNetRepository/resources/Deep-Speech-2-Trained-on-Baidu-English-Data
While something like Wolfram AudioIdentify V1 could be better if you're more interested in how it's said (similar to what you currently have)
https://resources.wolframcloud.com/NeuralNetRepository/resources/Wolfram-AudioIdentify-V1-Trained-on-AudioSet-Data
- Congratulations! This post is now featured in our Staff Pick column as distinguished by a badge on your profile of a Featured Contributor! Thank you, keep it coming, and consider contributing your work to the The Notebook Archive!
Wow, this is a super cool project. Nice job Dev!
Thanks!
This is really an interesting project! Nicely done!
Thanks for your encouragement!
Outstanding work!
Thanks for the support!
Great job Dev! Looking forward to seeing your work in the future.
Thanks! Hopefully I'll be able to expand this project in the future.
This is really interesting! Nice job with the layout of the outputs and the variables.