DeepSpeech accuracy data for librispeeh

Zhao_Xiaohui · February 27, 2018, 1:24am

Excellent work for the good WER accuracy in this release (5.66%
on librispeech test-clean).

We are trying to repeat the release accuracy, but we found that the training dataset is Fisher+Librispeech+SwitchBoard, while Fisher+SwitchBoard datasets are not free of charge.

So my question is

·
Do you have the accuracy data on librispeech? (trained on 960hour, tested on test-clean/test-other)

·
Could you provide the WER accuracy with and without the language model?

@kdavis-mozilla

BTW, i found the 12% WER accuracy from this issue, but i suppose there maybe accuracy update for it.

Best Regards.

Xiaohui

kdavis · February 27, 2018, 9:25am

Thanks!

I just ran a benchmark training the current master on all of librispeech train (clean + other) and running a test on all of librispeech test (clean + other) and got a WER of 20.6%[1], not an amazing result.

This results was without tuning any hyperparameters to librispeech and using the same hyperparameters as in the release run that trained on Fisher+Librispeech+SwitchBoard.

Zhao_Xiaohui · March 5, 2018, 2:43am

Thanks for your response.

I understand the 20.6% WER accuracy is acceptable considered the absence of language model, and pytorch got a very similar WER(21%) on DeepSpeech2.

Could you update the best WER accuracy on librispeech after the typerparameters tunning ?

FYI, baidu published the
DeepSpeech3, and we could get a better accuracy with RNN-transducer or Attention without any language model.

Best Regards.

Xiaohui

kdavis · March 5, 2018, 5:41am

Unfortunately training takes time and our servers are booked running benchmarks for our new release. So we don’t really have time to train a model on Librispeech and also tune the associated hyperparameters.

On DeepSpeech3, one of the main reasons RNN-transducer or Attention bested CTC was that they trained on lots of data. RNN-transducer and Attention were able to create an implicit language model. Librispeech is not a large data set. So I doubt the results could be matched using only Librispeech.

Zhao_Xiaohui · March 5, 2018, 6:09am

Agree with you about the librispeech dataset for DeepSpeech3, and you got the Fisher+SwitchBoard dataset, so you have the chance to get the paper accuracy on DeepSpeech3.

BTW, do you know the exact LDC number for Fisher and SwitchBoard dataset?

I got the following result but my result does not match with the paper.

Thanks.

kdavis · March 5, 2018, 8:46am

Unfortunately, we haven’t had a chance to repo the DeepSpeech3 results. However, we have two open issues to do so

But we don’t have the human/compute resources to tackle them now.

As for the data sets, the Fisher data set is from LDC2004T19, LDC2004S13, LDC2005T19, and LDC2005S13. The SwitchBoard data set is from LDC97S62.

Topic		Replies	Views
Deepspeech accuracy decreasing? DeepSpeech	8	2702	October 10, 2018
Benchmark results with v0.3.0? DeepSpeech	4	534	October 27, 2018
DeepSpeech Latest Results with English DeepSpeech	10	1319	July 14, 2019
Help with understanding benchmarks - are we at 5.6% word error rate on Librespeech Clean+Other? DeepSpeech	1	1062	May 25, 2018
DeepSpeech WER on librispeech clean dataset DeepSpeech	3	828	December 10, 2019

DeepSpeech accuracy data for librispeeh

Related topics