Has anyone from scratch trained or fine tuned DeepSpeech 0.7.0+ model for conversational American English

A_N · July 4, 2020, 3:35pm

Hello,

Was interested in knowing how many here have been successful in fine tuning the DeepSpeech models for conversational American English? If yes, how many additional hours of conversational data was used and what is the WER achieved?

Or has any one trained from scratch, if yes, with how many hours and what is the WER? Where any model changes done in this case?

I have been working with DeepSpeech 0.7.0 (Ubuntu 16.04, Python 3.6.5, Tensorflow 1.15.2, CUDA 10.0/cuDNN 7.6.5 training on RTX 2080TI). Inference on the released DeepSpeech model on my test data (conversational) results in an average WER ~40%. Changing the Language Model (with alpha, beta tuning on my custom text) results in an ~1% improvement. I have fine-tuned with about 2000 hours of part conversational part speech dataset (lr 0.00001, dropout_rate 0.40, with early stopping around 35 epochs, with es_epochs set to 20) have seen very marginal to no improvement on my test data. I am still working on improving the LM and also testing the effects with no LM.

I searched here and did not find any new posts particularly geared towards conversational sets. Was interested in knowing the experience that other people have had and what has or has not worked for them. Since the base models that are released have been trained on both Fisher and SwitchBoard (guessing it would be at least 1/3 of the total training dataset used), I was expecting slightly better results on conversational data.
Appreciate any input here.
Thanks

Topic		Replies	Views
DeepSpeech Latest Results with English DeepSpeech	10	1295	July 14, 2019
Low WER on Switchboard with DeepSpeech pretrained model DeepSpeech	1	855	May 31, 2018
Fine tuning with custom dataset doubles WER DeepSpeech	3	494	July 9, 2020
DeepSpeech WER on librispeech clean dataset DeepSpeech	3	616	December 10, 2019
DeepSpeech accuracy data for librispeeh DeepSpeech	5	4284	March 5, 2018

Has anyone from scratch trained or fine tuned DeepSpeech 0.7.0+ model for conversational American English

Related topics