Regarding the minimum number of hours, that depends on your definition of good model and good results. Besides, have a look at the issues on the Github repo linked above, there’s already a list of actionnable items to help fix and augment the quality of current datasets, including Common Voice in French.
Hello! If you need to know whats being said, that is, get only “keywords”, you can get decent results (WER %30 … if LM is even average … this is something someone else could comment ) using just 100-200 hrs of domain specific training data… but if you would like to get all “stopwords” and train general model to handle all kind of subjects then you need hundreds and hundreds of hours of training data. (Baidu used several thousands hours to train their model …)