Message Boards Message Boards

Computation of MFCC and Documentation of AudioLocalMeasurements?

There are many descriptions of the computation of MFCC vectors in the literature. They are, of course, the same conceptually but differ in a myriad of details including but not limited to the boundaries of the filters in the filter bank, the use (or not) of pre-emphasis, and the parameters of the DFT and the DCT transformations.

The documentation of the computation of MFCC by AudioLocalMeasurements is, shall we say, spare. In order to make use of Mathematica's MFCC vectors one obviously needs to know what when into their computation. These details should be provided in sufficient specificity that one can reproduce the MFCC output of AudioLocalMeasurements using more basic Mathematica functions such as Log, PeriodogramArray, Fourier, FourierDCT, etc. Perhaps there is someone in this community who knows these details and would be willing to post them.

Thanks in advance for their help.

Cheers, Scott

POSTED BY: Scott Guthery
5 Replies

One can see that the internal function called to compute "MFCC" is Audio`AudioMeasurementsDump`oMFCC:

Trace[
  foo = AudioLocalMeasurements[audio, "MFCC"]["Values"],
  mfcc_Audio`AudioMeasurementsDump`oMFCC :> (murf = mfcc),
  TraceInternal -> True];
foo == murf
(*  True  *)

One can inspect the code of Audio`AudioMeasurementsDump`oMFCC with GeneralUtilities`PrintDefinitions:

GeneralUtilities`PrintDefinitions@Audio`AudioMeasurementsDump`oMFCC

It calls a compiled function:

Audio`AudioMeasurementsDump`compiledMelSpectrogram // 
 CompiledFunctionTools`CompilePrint

Of course the compiled code is printed without variable names or comments. It's not indecipherable, but one can get the function that was compiled with symbolic variable names:

(* Audio`AudioMeasurementsDump`compiledMelSpectrogram:
 symbolic representation *)
Begin["Audio`AudioMeasurementsDump`"];
compiledMelSpectrogram[[-2]]
End[]

That's as far as I can go. I'm not an expert in this field. I do not know what MFCC stands for.

POSTED BY: Michael Rogers

Many thanks: I don't know how to do this 'looking inside' of functions. Running your code, it looks at a first glance that the answers are probably in there, but it will take me some time to unpack them (given the nested functions that are revealed).

Still it would be better if the documentation explained the options, and/or provided a reference.

BTW, MFCCs are cepstral coefficients which describe the 'shape' of an audio signal. A different way of decomposing it from Fourier.

POSTED BY: Gareth Russell

Just discovered this post because... YES! I am collaborating with someone who calculates MFCCs using a different software package that begins with M. Those values bear no relationship to the ones produced by Mathematica, and I can't get the WL to produce anything like them. But unlike MMA, the other package has a detailed description of methods used for their MFCCs. My personal feeling is that the lack of detailed documentation (including source citations) for the more sophisticated functions stops people taking the WL seriously as a professional tool. Too many of the functions are black boxes, and without replicability in other platforms, can't be used for scientific research. Frustratingly, I would guess that the large package of audio processing features added in v11 came by incorporating an existing open source library, for which there probably is detailed documentation. (I believe the image processing functions similarly came from incorporating OpenCV.)

As I use under an institutional license, I will hit up support@wolfram.com for some answers, and post back.

POSTED BY: Gareth Russell

An update: Wolfram support said that the information on how the MFCCs are calculated is 'not available', but they will suggest the relevant team add them in the documentation of some future version. This is… not great.

POSTED BY: Gareth Russell

For those following this exciting saga, a little additional investigation suggests that 1) padding is being applied even when Padding is set to 0, 2) pre-emphasis is also being applied but the parameter doesn't seem to be the standard 0.97, 3) the filter bank is logarithmic through-out not just at the top end, and 4) the FourierDCT seems to be of Type 2 rather than the standard Type 3.

All that said, I still can't reproduce the MFCC values coming out of AudioLocalMeasurements so all of the above may just be a bunny trail.

It certain would be nice if Wolfram at least cited the paper from which they took their MFCC algorithm. I don't think that is asking too much. Without knowing the details of the computation sufficient for its reconstruction, conclusions and results based on the values are on very shaky grounds.

POSTED BY: Scott Guthery
Reply to this discussion
Community posts can be styled and formatted using the Markdown syntax.
Reply Preview
Attachments
Remove
or Discard

Group Abstract Group Abstract