MFCC feature dimensions

shahdloo · July 14, 2018, 12:33pm

In the documentation for ‘‘audiofile_to_input_vector’’ function it reads that ‘‘MFCC features
at every 0.01s time step with a window length of 0.025s’’ are calculated. I tried to confirm this statement.
I have a 16kHz wav file containing 9631014 samples. the MFCC features I get from the ‘‘audiofile_to_input_vector’’ function have dimension 30097*494 which I read as [9631014/320]*[26+2*26*9].
I conclude that 494 MFCC features are extracted for every 320 samples which results in 0.02s time steps. Is my reasoning correct? So is this really 0.02s time step instead of 0.01s?

shahdloo · July 16, 2018, 9:06pm

Figured out the answer. This is due to the parameter ‘‘BiRNN stride = 2’’ which keeps every other feature sample resulting in 0.02s actual time step.

reuben · July 16, 2018, 9:57pm

Yep! We should probably experiment with computing features over 20ms windows instead of using the stride to see how it performs…