Information on pre-trained Models

sequeira.denzil · October 29, 2018, 6:36am

Can someone please help me with the Hyperparameters used for training the v3.0 model released by DeepSpeech

Also I’m trying to use the pre-trained model with the Common Voice corpus for Training but the loss is constant around 142(how do I minimize it?). Any help around this would be highly appreciated

I’m using the following parameters

–epoch -1000
–test_batch_size 10
–dev_batch_size 10
–train_batch_size 10
–fulltrace True
–log_level 1
–learning_rate 0.01

Thanks

lissyx · October 29, 2018, 8:02am

You need to re-use the same hyper-parameters that we documented: https://github.com/mozilla/DeepSpeech/releases/tag/v0.3.0

Specifically, your learning rate seems to be way different ; and you don’t explain how many epochs you left the training progress before concluding that it was constant ?

Also, how many hours of data do you have ?

sequeira.denzil · October 29, 2018, 8:44am

Thanks for your response,it helped me get an idea about the hyperparameters being used.

I am training on the entire Common Voice corpus(the one available from the website).

The value was after the 4th epoch. It is still running though. How many epochs do you recommend before concluding the loss is constant?

lissyx · October 29, 2018, 9:13am

As much as I can remember, that’s around one hundred of hours, right? I did similar experiments, training with ~100hrs of french on top of current english model, and loss did decrease steadily, running on ~25-30 epochs.

Sadly, I don’t remember of the early behavior. Maybe you should give it a try to more epochs ?

rajpuneet.sandhu · October 29, 2018, 2:50pm

@lissyx, if we are fine tuning 0.3.0 from the checkpoint that you provided, aren’t we supposed to use epoch= -3 as in the Deepspeech documentation?

lissyx · October 29, 2018, 2:52pm

Where do you see -3 in the documentation ?

rajpuneet.sandhu · October 29, 2018, 3:00pm

Note: the released models were trained with --n_hidden 2048 , so you need to use that same value when initializing from the release models. Note as well the use of a negative epoch count -3 (meaning 3 more epochs) since the checkpoint you’re loading from was already trained for several epochs.

This comes from the section ‘Continuing training from a release model’. Maybe I misinterpreted this, I am just trying to understand how this works.

lissyx · October 29, 2018, 7:17pm

Ok, well, this is just an example of how to perform this task, it’s not an order. You need to try and see what works best for you, it can also depends on your dataset and your requirements.

rajpuneet.sandhu · October 31, 2018, 4:48pm

@lissyx @reuben @kdavis what’s the meaning of using a negative epoch value?

lissyx · October 31, 2018, 4:51pm

You quoted it yourself in your previous reply

rajpuneet.sandhu · October 31, 2018, 4:55pm

It says -3 is for three more epochs, what if I used +3? what would that mean? wouldn’t that mean three more epochs, as well?

reuben · October 31, 2018, 5:19pm

No, it would mean train until the third epoch.

reuben · October 31, 2018, 5:20pm

(When you resume from a checkpoint it resumes from the previous epoch count)

rajpuneet.sandhu · October 31, 2018, 5:50pm

okay. so, one last question…if the pretrained model was trained for 30 epochs and now for fine tuning I give epoch= +40, what would that mean?

Topic		Replies	Views
Training pretrained deepspeech-0.6.1 on other datasets DeepSpeech	3	750	February 18, 2020
Training of Epoch x - loss: inf. again decrease LR, stil same issue repeating DeepSpeech	10	827	October 18, 2018
Would you please show some major training parameters of the pre-trained model? DeepSpeech	8	2246	February 2, 2018
Deepspeech model DeepSpeech	4	948	September 24, 2019
Trainig model loss DeepSpeech	27	1144	March 13, 2020

Information on pre-trained Models

Related topics