I got this MFCC error when I running Deep Speech
Missing 5 bands starting at 0 in mel-frequency design. Perhaps too many channels or not enough frequency resolution in spectrum. (input_length: 257 input_sample_rate: 44100 output_channel_count: 40 lower_frequency_limit: 20 upper_frequency_limit: 4000
How I should do to improve frequency?
I have 2 seconds in each wav file.

Thank you

Is that at inference or training ?

It is at training from Epoch 0. I am using transfer learning 2 branch. My utterances has English and Chinese mixed, but English occupied more than Chinese words. @lissyx

Is your whole dataset using 44.1kHz stereo ? have you made any change to the code ?

No, I change it to 16 bits and 44.1khz(but some of data might smaller or bigger than that) mono for my whole dataset.
example of my wav file property:

I did not make any changes to the code.
Should I make all same size? @lissyx

Please avoid screenshots.

This shows 48kHz, your error reports about 44.1kHz and you says it has been converted to 16kHz. Could you make sure your whole dataset is mono 16kHz ?

So the whole dataset has to be 16khz instead of 44.1kHz? @lissyx

No, the whole dataset has to be in the same format. Our default setting and what we train on is 16kHz, so it’s easier for you. Just don’t mix different sample rates.

1 Like