Train csv used for DeepSpeech 0.7.0

A_N · May 24, 2020, 5:40pm

Just wanted to confirm that 0.7.0 was trained on 3817 hours of audio from LibriSpeech, SWB, Fisher and WAMU NPR as mentioned in the forum.

Also was [Switchboard cellular Part 1] used?(https://catalog.ldc.upenn.edu/LDC2001S15) used apart from Switchboard 1

Is it possible to know how much of Fisher was used? Seems like that is conversational and most related to the kind of data that I am looking at. Was the entire Fisher data set used?

Is it possible get the training csv that you used?

thanks

reuben · May 24, 2020, 6:05pm

Yes.

No.

Yes.

Of course not, most of the datasets you just mentioned are proprietary.

reuben · May 24, 2020, 6:06pm

Most of these answers are available on the release notes of every model we release, by the way.

A_N · May 24, 2020, 7:22pm

Thanks @reuben
Also another question based on this thread:

Are there any plans for releasing FP16 models and the ability to use automatic_mixed_precision when fine tuning?

thanks

A_N · June 2, 2020, 4:41pm

Thanks @reuben. Sorry if I missed any of those details that are already covered.

I had one more question on your initial training: Was the entire 3817 hours fed into the model and the 3 phase training done or was it broken down by dataset or some other fashion for the 3-phase training.
Does the final outcome of the model (WER) differ based on whether it was trained for the 3817 hours all in one shot or trained first on 960 hours of LS, then 1700 WAMU recordings and so (where we compute the WER using the final cumulative model). Hope I am making sense.

Asking this also for my own custom training standpoint- so if I have a 500 hour dataset, will WER vary when I train:
the entire 500 hours over the 0.7.0 released checkpoint and test
or
train an 100 hours over the 0.7.0 released checkpoint, then add the next 100 hours and so to get to the same 500 hours as in the prior case and then test

Also would you recommend using early stopping when fine tuning?

thank you very for taking the time to answer

reuben · June 2, 2020, 10:24pm

Yes.

I haven’t tried doing that.

I would recommend training until validation loss stops improving.

A_N · June 2, 2020, 10:32pm

Thank you very much for the response. So if I have 500 hours it would be a good idea to train in one shot save the model and checkpoints. If I get a new set of 1000 hours of similar type data, should I do the 500+1000 from the released 0.7.0 checkpoint or just do the new 1000 on my saved checkpoint? Is there a recommended approach for this?

Also should the 3 phase training process be followed for fine tuning as well?

reuben · June 2, 2020, 10:40pm

The three phase training is simply a consequence of training until validation loss stops improving, then lowering the learning rate and continuing from the best validation loss checkpoint. It’s not necessarily something you’ll have to do for fine tuning, as the behavior of the training process changes with the dataset. You’ll have to figure out for yourself what works best on your dataset.

A_N · June 2, 2020, 10:51pm

Thanks for the clarification.
Would you also be comment on this:
So if I have 500 hours it would be a good idea to train in one shot save the model and checkpoints. If I get a new set of 1000 hours of similar type data, should I do the 500+1000 from the released 0.7.0 checkpoint or just do the new 1000 on my saved checkpoint? Is there a recommended approach for this?

Thank you very much