Using common voice datasets?

I setup and used the web-mic example, which included using the models
these are about 1GB
the results were sub-optimal, and i’m wondering about how to improve it.
i wondered regarding the datasets from common voice, 50GB, available for download here - would it help?
if so, how would i go about replacing the above files with it?

any other ideas for improvements?

thanks very much for any idea

Are we talking about the English model here? It includes Common Voice.

Without more information, it is hard to say anything:

Thank you.

yes, its about the English model

I wondered if we should train the model on more datasets in order to improve WER, and if so, which ones would you recommend (for general recognition)


The model works fine for somewhat slow American English as this is the data that is freely available to train. Depending on what you want to recognize, find some hundred hours of that an fine tune the model for it.

hi Olaf

Would one hundred hours suffice?


Depends on what you want to do. What is your use case?