Question about the MFCC computation changes

Hi,

I have observed that in the last release 0.4 the MFCC are computed in a different way, doubling the time interval-step instead of 2-striding, as well as using hamming function.

Does these changes have a high impact in the model WER? or if you have measured the effect of them and want to share any information about it. I would help me to take my own decisions.

Thanks a lot,
Mar

We had good preliminary results and are currently training models on our full training dataset to validate them. If everything goes OK this should go out with the v0.4 release.

1 Like

@reuben
could you please give us more context here ?
for example, is it a throughput / latency tradeoff ?

In my use case, I seek for best possible transcription accuracy (latency and compute power are secondary). Should I try to come back to winlen=0.025 and winstep=0.01 ?

The original decision to throw away every other feature window was based on an experiment that showed it did not degrade accuracy, so it was performance for free.

Instead of throwing away features, in v0.4 we increased the feature computation window to 2x the previous value. If you come back to 0.025s/0.01s you’ll be doing more work, but may achieve better accuracy in the end. For us the impact was negligible, but this was a while ago.