Hi,
I downloaded pre-trained model in french and tried to run it as indicated in the doc :
deepspeech --model output_graph.pbmm --scorer kenlm.scorer --audio my_audio_file.wav
I get the pbmm and scorer files on this page https://github.com/common-voice/commonvoice-fr/releases/tag/fr-v0.6
Then what I get for a file containing this sentence (coming from the common_voice dataset) :
il sera supérieur à quatre pourcent soit vingt milliards de plus
il sera supérieur à quatre pourcent se livilla et
I wrote “very” bad result but I know this result is not so terrible. But it comes from the dataset. Now if I use this pretrained model on real life vocal message there is a lot of mistakes like this message :
Vous voulez vraiment que je m'énerve ? Non mais vous voulez vraiment que je m'énerve
becomes :
mais rudement le bar de vraiment plaire
I suppose that result is not good enough because lack of data in french dataset and so I’m participating actively to the common-voice french dataset.
So I have two questions:
-
did I something wrong or is the model not good enough today and will be improved with time ?
-
I have the project to build my own dataset with thousands of hours conversation sound I have, but I will use Azure Speech-To-Text for this project and so spend some money before training. My second question is : will it be better with thousand hours more ? I’m wondering because I tried with english model (version 0.9.3) and got some mistakes too