Telephonie Speech Recognition

Bleau · November 28, 2021, 8:23am

I need a model that can perform on 8khz audio. I tried upsampling fom 8khz to 16khz (for current deepspeech model) but this ends up most of the time with a very poor transcription.
Does anybody trained 8khz model and can share some insights? Which data did you use? Is it a good practise to take the common voice data, downsample it and use it for training?

Any tips for telephony speech recognition? I would be thankful!

nshmyrev · November 28, 2021, 10:58am

What language do you need? English?

Bleau · November 29, 2021, 8:45am

I’m interested in English and German models

nshmyrev · November 29, 2021, 11:00pm

Nemo conformer-ctc is good for telephony

Bleau · November 30, 2021, 1:30pm

Does this model work on 8khz audio? I saw in docs, that it is trained on 16kHz audio…

nshmyrev · November 30, 2021, 1:44pm

Yes, they trained on both and work on both too.

Bleau · December 2, 2021, 3:41pm

Thanks @nshmyrev
Do you have a link to the model?
Do they provide some example scripts for testing it?

nshmyrev · January 5, 2022, 12:32am

Check https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/stt_fr_conformer_ctc_large