Why are my results so bad using the downloadable model?

mbonsign · April 6, 2020, 8:07am

I used professionally recorded English verses, and the accuracy was terrible. I rerecorded them at the preferred frequency, and the results were no better. For now, I’m using Google’s STT. Do you know of any quantitative analysis of how well Deep Speech can perform compared to Google’s STT? If no one has it working with 99% accuracy, why should I keep trying?

othiele · April 6, 2020, 8:13am

Because you are not sending private conversations into the cloud and can do it cheaper at scale

If you want the best results for STT I recommend sending it to AWS, Google, MS and IBM and combine the results. It will cost you a bit, but results will be more accurate. There are always tradeoffs.

As for what the current release can do well, search this forum a bit.

dabinat · April 6, 2020, 6:33pm

Can you give an example of what the text was supposed to be and what DeepSpeech thought it was?

mbonsign · April 6, 2020, 7:52pm

Sorry. It was so bad that I just deleted DeepSpeech. The best it got was about 60%. I was using samples from the Open Speech Repository.