Run deepspeech with pretrained-model give me very bad results

LucieDevGirl · March 17, 2021, 5:16pm

Hi,
I downloaded pre-trained model in french and tried to run it as indicated in the doc :

deepspeech --model output_graph.pbmm --scorer kenlm.scorer --audio my_audio_file.wav

I get the pbmm and scorer files on this page https://github.com/common-voice/commonvoice-fr/releases/tag/fr-v0.6

Then what I get for a file containing this sentence (coming from the common_voice dataset) :
il sera supérieur à quatre pourcent soit vingt milliards de plus
il sera supérieur à quatre pourcent se livilla et

I wrote “very” bad result but I know this result is not so terrible. But it comes from the dataset. Now if I use this pretrained model on real life vocal message there is a lot of mistakes like this message :
Vous voulez vraiment que je m'énerve ? Non mais vous voulez vraiment que je m'énerve
becomes :
mais rudement le bar de vraiment plaire

I suppose that result is not good enough because lack of data in french dataset and so I’m participating actively to the common-voice french dataset.

So I have two questions:

did I something wrong or is the model not good enough today and will be improved with time ?
I have the project to build my own dataset with thousands of hours conversation sound I have, but I will use Azure Speech-To-Text for this project and so spend some money before training. My second question is : will it be better with thousand hours more ? I’m wondering because I tried with english model (version 0.9.3) and got some mistakes too

lissyx · March 18, 2021, 9:37am

It’s mostly on-par with what other contributor like @ptitloup could verify on their own datasets

Hard to be definitive, but a few thousands hour more can’t be bad

It’s not perfect, as documented by the WER in the release ; but compare a pretrained model we released with ~1200h of english (I think 0.1 or 0.2?), and you can see the improvement

ptitloup · March 18, 2021, 1:45pm

Just tell me if you want I test on my own. Give me your audio or video and I will push it

LucieDevGirl · March 18, 2021, 2:46pm

Ok thank you I send it to you by private message

nice thank you

Why didn’t I think of that … Thank you !

LucieDevGirl · March 23, 2021, 7:22am

Sorry , did you listen the audio file I sent to you last week ?

ptitloup · March 23, 2021, 9:02am

Yes, I just answer you by PM. If you have some question, don’t hesitate to ask me, maybe we can talk about our projects.