Benchmark results with v0.3.0?

It’s reported that WER on LibreSpeech’s test-clean set was 6.5%. I believe this was with DeepSpeech v0.1.1. Do we have any results on the same benchmark data with the newer versions, 0.3.0 and/or 0.2.0?

When I tested on my little proprietary test set, 0.3.0 was slightly worse than 0.2.0. I was wondering if this was an anomaly or a trend.
Thanks!

6.5% was with v0.1, but that turned out to be affected by the test data being included by accident in the language model. v0.2 then had an increase in the error rate due to the reduction in the model size for streaming as well as removing the offending data from the construction of the language model. v0.3 fixed some non-deterministic behavior bugs that were introduced in v0.2, but the acoustic model released is the same as the one in v0.2, so you the v0.3 error rate is probably more correct. We’re working on different features that will hopefully bring the WER back down to under 10%.

4 Likes

Awesome. Thanks for the information!

Can you please expand on this, I don’t understand it clearly. Does it mean the transcripts of the test data set was included in the language corpus when building the language model?

Yes. This was corrected in the LM released with v0.2 and v0.3.