Gibberish inference result for GSM audio

I have trained the model with 300 hour gsm audios. Validation result is pretty good 9% CER. Converted audios using ‘sox input.wav -r 8000 -c1 output.gsm lowpass 4000 compand 0.02,0.05 -60,-60,-30,-10,-20,-8,-5,-8,-2,-8 -8 -7 0.05’.

But the test result is pretty worse almost every character predicted wrong.

GSM is an unsupported audio file format. Currently only wav is supported.

You should convert your GSM audio to 16KHz, 16bit, mono wav.