Train French Model

illi88 · May 14, 2019, 12:50pm

Hello guys!!
I would like to train my own model in french , but the data of common voice is not enough to give a powerful model,

I want to know where I can collect more data , if you have links do not hesitate !!
what is the minimum number of hours to have a good model with good results?

lissyx · May 14, 2019, 2:24pm

You are welcome to contribute on Français (fr) - Mozilla Discourse and GitHub - common-voice/commonvoice-fr: Tooling for producing French dataset for Common Voice, there’s already a list of dataset amongst Common Voice that you can use.

Regarding the minimum number of hours, that depends on your definition of good model and good results. Besides, have a look at the issues on the Github repo linked above, there’s already a list of actionnable items to help fix and augment the quality of current datasets, including Common Voice in French.

pete · May 15, 2019, 6:17am

Hello! If you need to know whats being said, that is, get only “keywords”, you can get decent results (WER %30 … if LM is even average … this is something someone else could comment ) using just 100-200 hrs of domain specific training data… but if you would like to get all “stopwords” and train general model to handle all kind of subjects then you need hundreds and hundreds of hours of training data. (Baidu used several thousands hours to train their model …)

illi88 · May 15, 2019, 11:49am

@pete clear answer thank you

illi88 · May 15, 2019, 11:50am

@lissyx thank you for your answers

Topic		Replies	Views
Train Fench common voice data set DeepSpeech	9	1027	April 12, 2019
Material needed for FRENCH model creation DeepSpeech	32	3404	March 7, 2018
Model from scratch DeepSpeech learning	5	884	September 20, 2019
All about french model DeepSpeech	2	742	January 30, 2019
DeepSpeech french model DeepSpeech	5	5333	August 2, 2019

Train French Model

Related topics