I have trained the model with 300 hour gsm audios. Validation result is pretty good 9% CER. Converted audios using ‘sox input.wav -r 8000 -c1 output.gsm lowpass 4000 compand 0.02,0.05 -60,-60,-30,-10,-20,-8,-5,-8,-2,-8 -8 -7 0.05’.
But the test result is pretty worse almost every character predicted wrong.