Steps to reach Deep Speech in the wild

masoud_parpanchi · August 10, 2020, 12:52pm

hello every one
kdavis recently created a post and asked us to share what is our use case for deep speech.

So many times we see in the Forum people are asking each other ( I created one post today for this and shared my experiences in commonvoice )

can you share your checkpoints, hyperparameters, amount of data, and …

Here I created this post And I ask you all If it is possible to share with others what was the steps you get there. any information will be appreciated. This can help others to use deep speech better and create their models faster.

othiele · August 10, 2020, 1:05pm

Sorry, can’t share checkpoints for our commercial project, but you can find good information on several language models in the repo by @dan.bmh, which is linked in the thread you hijacked

He posts checkpoints, hyperparameters and data. Even how he builds his language models. Please study this and other resources to understand how DeepSpeech works. You will have to put some time into reading the docs and searching this forum.

masoud_parpanchi · August 14, 2020, 1:41pm

first thank for the link. that was very helpful.

Is it possible to say me how is your model(which created by mozilla Deepspeech) working compared with google STT system?

And how many hours of audios you had?
I have persian common voice and so many episodes of persian TalkShows. this is hard dataset I think.

othiele · August 17, 2020, 9:17am

Google is better, but they use a couple thousand hours more than us, but you can’t run on premise

We currently have a thousand plus, but you should get ok results with 400-500 hours plus for general language and less for specific use case with limited vocabulary etc.