I don't know how to solve it either. Most of my coding when I was working had nothing to do with sound. However, the APIs for the sound unit in OS X looks pretty straight forward, and I can call Mathematica code from within c using MathLink. Provided that latency is not too big an issue, it should not be that hard to wrap the mathematic code is a suitable c (objective c, c++, Swift) wrapper. Since this will not be commercial code, the GUI can simply make use of the standard widgets.
What I want to do is to take the sound import, create a suitable output which may have a delay of several seconds and output the sound in real time. The bulk of the manipulation will take place in Mathematica/Wolfram Language.
I don't do Windows, but I think that the same thing could be done with that OS.
I will raise this issue at the WTC next week. Last year, I was told that audio processing would eventually be on a par with video processing, so I think it is on Wolfram's radar. With so much going on, it might take a bit of nudging to get the project moved up the queue, though.
I'm working on something else right now, but I will get to real-time sound processing using Mathematica eventually. If I get something working, I will post on Wolfram Community.
geo3rge