Excellent work for the good WER accuracy in this release (5.66%
on librispeech test-clean).
We are trying to repeat the release accuracy, but we found that the training dataset is Fisher+Librispeech+SwitchBoard, while Fisher+SwitchBoard datasets are not free of charge.
So my question is
·
Do you have the accuracy data on librispeech? (trained on 960hour, tested on test-clean/test-other)
·
Could you provide the WER accuracy with and without the language model?
I just ran a benchmark training the current master on all of librispeech train (clean + other) and running a test on all of librispeech test (clean + other) and got a WER of 20.6%[1], not an amazing result.
This results was without tuning any hyperparameters to librispeech and using the same hyperparameters as in the release run that trained on Fisher+Librispeech+SwitchBoard.
Unfortunately training takes time and our servers are booked running benchmarks for our new release. So we don’t really have time to train a model on Librispeech and also tune the associated hyperparameters.
On DeepSpeech3, one of the main reasons RNN-transducer or Attention bested CTC was that they trained on lots of data. RNN-transducer and Attention were able to create an implicit language model. Librispeech is not a large data set. So I doubt the results could be matched using only Librispeech.
Agree with you about the librispeech dataset for DeepSpeech3, and you got the Fisher+SwitchBoard dataset, so you have the chance to get the paper accuracy on DeepSpeech3.
BTW, do you know the exact LDC number for Fisher and SwitchBoard dataset?
I got the following result but my result does not match with the paper.