I need a model that can perform on 8khz audio. I tried upsampling fom 8khz to 16khz (for current deepspeech model) but this ends up most of the time with a very poor transcription.
Does anybody trained 8khz model and can share some insights? Which data did you use? Is it a good practise to take the common voice data, downsample it and use it for training?
Any tips for telephony speech recognition? I would be thankful!