Gibberish inference result for GSM audio

sakhawatsumit · August 18, 2018, 6:26am

I have trained the model with 300 hour gsm audios. Validation result is pretty good 9% CER. Converted audios using ‘sox input.wav -r 8000 -c1 output.gsm lowpass 4000 compand 0.02,0.05 -60,-60,-30,-10,-20,-8,-5,-8,-2,-8 -8 -7 0.05’.

But the test result is pretty worse almost every character predicted wrong.

kdavis · August 20, 2018, 12:47pm

GSM is an unsupported audio file format. Currently only wav is supported.

You should convert your GSM audio to 16KHz, 16bit, mono wav.