Group Abstract Group Abstract

Message Boards Message Boards

Computation of MFCC and Documentation of AudioLocalMeasurements?

There are many descriptions of the computation of MFCC vectors in the literature. They are, of course, the same conceptually but differ in a myriad of details including but not limited to the boundaries of the filters in the filter bank, the use (or not) of pre-emphasis, and the parameters of the DFT and the DCT transformations.

The documentation of the computation of MFCC by AudioLocalMeasurements is, shall we say, spare. In order to make use of Mathematica's MFCC vectors one obviously needs to know what when into their computation. These details should be provided in sufficient specificity that one can reproduce the MFCC output of AudioLocalMeasurements using more basic Mathematica functions such as Log, PeriodogramArray, Fourier, FourierDCT, etc. Perhaps there is someone in this community who knows these details and would be willing to post them.

Thanks in advance for their help.

Cheers, Scott

POSTED BY: Scott Guthery
5 Replies

One can see that the internal function called to compute "MFCC" is Audio`AudioMeasurementsDump`oMFCC:

Trace[
  foo = AudioLocalMeasurements[audio, "MFCC"]["Values"],
  mfcc_Audio`AudioMeasurementsDump`oMFCC :> (murf = mfcc),
  TraceInternal -> True];
foo == murf
(*  True  *)

One can inspect the code of Audio`AudioMeasurementsDump`oMFCC with GeneralUtilities`PrintDefinitions:

GeneralUtilities`PrintDefinitions@Audio`AudioMeasurementsDump`oMFCC

It calls a compiled function:

Audio`AudioMeasurementsDump`compiledMelSpectrogram // 
 CompiledFunctionTools`CompilePrint

Of course the compiled code is printed without variable names or comments. It's not indecipherable, but one can get the function that was compiled with symbolic variable names:

(* Audio`AudioMeasurementsDump`compiledMelSpectrogram:
 symbolic representation *)
Begin["Audio`AudioMeasurementsDump`"];
compiledMelSpectrogram[[-2]]
End[]

That's as far as I can go. I'm not an expert in this field. I do not know what MFCC stands for.

POSTED BY: Michael Rogers

Many thanks: I don't know how to do this 'looking inside' of functions. Running your code, it looks at a first glance that the answers are probably in there, but it will take me some time to unpack them (given the nested functions that are revealed).

Still it would be better if the documentation explained the options, and/or provided a reference.

BTW, MFCCs are cepstral coefficients which describe the 'shape' of an audio signal. A different way of decomposing it from Fourier.

POSTED BY: Gareth Russell

Just discovered this post because... YES! I am collaborating with someone who calculates MFCCs using a different software package that begins with M. Those values bear no relationship to the ones produced by Mathematica, and I can't get the WL to produce anything like them. But unlike MMA, the other package has a detailed description of methods used for their MFCCs. My personal feeling is that the lack of detailed documentation (including source citations) for the more sophisticated functions stops people taking the WL seriously as a professional tool. Too many of the functions are black boxes, and without replicability in other platforms, can't be used for scientific research. Frustratingly, I would guess that the large package of audio processing features added in v11 came by incorporating an existing open source library, for which there probably is detailed documentation. (I believe the image processing functions similarly came from incorporating OpenCV.)

As I use under an institutional license, I will hit up support@wolfram.com for some answers, and post back.

POSTED BY: Gareth Russell
POSTED BY: Gareth Russell
POSTED BY: Scott Guthery
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard