Help with understanding benchmarks - are we at 5.6% word error rate on Librespeech Clean+Other?

Hey all,

I realize that benchmarks can be a tricky subject but would really like to get some baselines of the performance of mozilla/Deepspeech on English speech so we have something to compare with /aim for as we develop against other languages.

Per another comment on this forum I googled for issues in Github with the word ‘benchmark’ in them. I see, for example this result from Feb 20:

" Test of Epoch 11 - WER: 0.206361, loss: 40.8849374084, mean edit distance: 0.108141 "

Is that a 20.6% WER and is it comparable to the “WER test-other” column from https://github.com/syhw/wer_are_we#librispeech ?

Apologies if I am getting wrong end of the stick… will adjust subject line of this question based on feedback if necessary.

Thanks

This is a baseline to judge streaming architectural changes against not anything more.

So we selected a small data set to train upon so as to not spend too much time training. Thus the WER results are not optimal DeepSpeech results. They are simply a quick test to measure streaming architectural changes against.

To see more optimal WER results refer to the release notes[1] which states DeepSpeech: “…achieves a 5.6% word error rate on the LibriSpeech clean test corpus”.

I hope that helps

2 Likes