Hey there. Sorry for a newbie question but could someone point me in a direction where I could find some info about the performance of DeepSpeech on the Common Voice English dataset (is it yet considered as a benchmark?). I saw some mentioned WER ~40% on the test set for one of the DeepSpeech versions. And that’s it. Couldn’t find anything else. So, will appreciate any links to blogs, papers that report performance on English Common Voice
search for “word error rate” here:
As Mozilla offers a great model as @baconator mentions, few people are training just on Common Voice alone. Depends heavily on the test set, but I would guess you end up at 0.15-0.20 for most versions of STT. See some WERs for other languages here
Thanks for the link. I already checked it out and found only WER reported for the LibriSpeech test-clean which, I assume, is way lower than the same model would get on the CommonVoice. Do you know if it makes sense to speculate about WER on CommonVoice based on WER for LibriSpeech test-other? both sets have a fair degree of the accented speech, recording conditions are different though
Thanks @othiele, will have a look. Cheers
The reason Librispeech is used as a benchmark is because the test set is very accurately transcribed. Currently Common Voice has errors in the validated dataset so it’s less suitable for benchmarking.