DeepSpeech accuracy data for librispeeh

Excellent work for the good WER accuracy in this release (5.66%
on librispeech test-clean).

We are trying to repeat the release accuracy, but we found that the training dataset is Fisher+Librispeech+SwitchBoard, while Fisher+SwitchBoard datasets are not free of charge.

So my question is

·
Do you have the accuracy data on librispeech? (trained on 960hour, tested on test-clean/test-other)

·
Could you provide the WER accuracy with and without the language model?

@kdavis-mozilla

BTW, i found the 12% WER accuracy from this issue, but i suppose there maybe accuracy update for it.

Best Regards.

Xiaohui

Thanks!

I just ran a benchmark training the current master on all of librispeech train (clean + other) and running a test on all of librispeech test (clean + other) and got a WER of 20.6%[1], not an amazing result.

This results was without tuning any hyperparameters to librispeech and using the same hyperparameters as in the release run that trained on Fisher+Librispeech+SwitchBoard.

Thanks for your response.

I understand the 20.6% WER accuracy is acceptable considered the absence of language model, and pytorch got a very similar WER(21%) on DeepSpeech2.

Could you update the best WER accuracy on librispeech after the typerparameters tunning ?

FYI, baidu published the
DeepSpeech3
, and we could get a better accuracy with RNN-transducer or Attention without any language model.

Best Regards.

Xiaohui

Unfortunately training takes time and our servers are booked running benchmarks for our new release. So we don’t really have time to train a model on Librispeech and also tune the associated hyperparameters.

On DeepSpeech3, one of the main reasons RNN-transducer or Attention bested CTC was that they trained on lots of data. RNN-transducer and Attention were able to create an implicit language model. Librispeech is not a large data set. So I doubt the results could be matched using only Librispeech.

Agree with you about the librispeech dataset for DeepSpeech3, and you got the Fisher+SwitchBoard dataset, so you have the chance to get the paper accuracy on DeepSpeech3.

BTW, do you know the exact LDC number for Fisher and SwitchBoard dataset?

I got the following result but my result does not match with the paper.

Thanks.

Unfortunately, we haven’t had a chance to repo the DeepSpeech3 results. However, we have two open issues to do so

But we don’t have the human/compute resources to tackle them now.

As for the data sets, the Fisher data set is from LDC2004T19, LDC2004S13, LDC2005T19, and LDC2005S13. The SwitchBoard data set is from LDC97S62.