Telephonie Speech Recognition

I need a model that can perform on 8khz audio. I tried upsampling fom 8khz to 16khz (for current deepspeech model) but this ends up most of the time with a very poor transcription.
Does anybody trained 8khz model and can share some insights? Which data did you use? Is it a good practise to take the common voice data, downsample it and use it for training?

Any tips for telephony speech recognition? I would be thankful!

1 Like

What language do you need? English?

I’m interested in English and German models

Nemo conformer-ctc is good for telephony

Does this model work on 8khz audio? I saw in docs, that it is trained on 16kHz audio…

Yes, they trained on both and work on both too.

Thanks @nshmyrev
Do you have a link to the model?
Do they provide some example scripts for testing it?

Check https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large