Computation of MFCC and Documentation of AudioLocalMeasurements?

Posted 4 years ago
There are many descriptions of the computation of MFCC vectors in the literature. They are, of course, the same conceptually but differ in a myriad of details including but not limited to the boundaries of the filters in the filter bank, the use (or not) of pre-emphasis, and the parameters of the DFT and the DCT transformations.

The documentation of the computation of MFCC by AudioLocalMeasurements is, shall we say, spare. In order to make use of Mathematica's MFCC vectors one obviously needs to know what when into their computation. These details should be provided in sufficient specificity that one can reproduce the MFCC output of AudioLocalMeasurements using more basic Mathematica functions such as Log, PeriodogramArray, Fourier, FourierDCT, etc. Perhaps there is someone in this community who knows these details and would be willing to post them.

Thanks in advance for their help.

Cheers, Scott

For those following this exciting saga, a little additional investigation suggests that 1) padding is being applied even when Padding is set to 0, 2) pre-emphasis is also being applied but the parameter doesn't seem to be the standard 0.97, 3) the filter bank is logarithmic through-out not just at the top end, and 4) the FourierDCT seems to be of Type 2 rather than the standard Type 3.

All that said, I still can't reproduce the MFCC values coming out of AudioLocalMeasurements so all of the above may just be a bunny trail.

It certain would be nice if Wolfram at least cited the paper from which they took their MFCC algorithm. I don't think that is asking too much. Without knowing the details of the computation sufficient for its reconstruction, conclusions and results based on the values are on very shaky grounds.

