Using common voice datasets?

Lior_M · November 16, 2020, 12:15pm

Hi,
I setup and used the web-mic example, which included using the models
deepspeech-0.8.0-models.pbmm
deepspeech-0.8.0-models.scorer
these are about 1GB
the results were sub-optimal, and i’m wondering about how to improve it.
i wondered regarding the datasets from common voice, 50GB, available for download here - https://commonvoice.mozilla.org/en/datasets. would it help?
if so, how would i go about replacing the above files with it?

any other ideas for improvements?

thanks very much for any idea
Lior

othiele · November 16, 2020, 12:34pm

Are we talking about the English model here? It includes Common Voice.

Without more information, it is hard to say anything:

Lior_M · November 16, 2020, 8:52pm

Thank you.

yes, its about the English model

I wondered if we should train the model on more datasets in order to improve WER, and if so, which ones would you recommend (for general recognition)

thanks
Lior

othiele · November 17, 2020, 8:52am

The model works fine for somewhat slow American English as this is the data that is freely available to train. Depending on what you want to recognize, find some hundred hours of that an fine tune the model for it.

Lior_M · November 17, 2020, 9:31am

hi Olaf

Would one hundred hours suffice?

thanks
Lior

othiele · November 17, 2020, 9:34am

Depends on what you want to do. What is your use case?