Information on pre-trained Models

Can someone please help me with the Hyperparameters used for training the v3.0 model released by DeepSpeech

Also I’m trying to use the pre-trained model with the Common Voice corpus for Training but the loss is constant around 142(how do I minimize it?). Any help around this would be highly appreciated

I’m using the following parameters

–epoch -1000
–test_batch_size 10
–dev_batch_size 10
–train_batch_size 10
–fulltrace True
–log_level 1
–learning_rate 0.01

Thanks

You need to re-use the same hyper-parameters that we documented: https://github.com/mozilla/DeepSpeech/releases/tag/v0.3.0

Specifically, your learning rate seems to be way different ; and you don’t explain how many epochs you left the training progress before concluding that it was constant ?

Also, how many hours of data do you have ?

Thanks for your response,it helped me get an idea about the hyperparameters being used.

I am training on the entire Common Voice corpus(the one available from the website).

The value was after the 4th epoch. It is still running though. How many epochs do you recommend before concluding the loss is constant?

As much as I can remember, that’s around one hundred of hours, right? I did similar experiments, training with ~100hrs of french on top of current english model, and loss did decrease steadily, running on ~25-30 epochs.

Sadly, I don’t remember of the early behavior. Maybe you should give it a try to more epochs ?

@lissyx, if we are fine tuning 0.3.0 from the checkpoint that you provided, aren’t we supposed to use epoch= -3 as in the Deepspeech documentation?

Where do you see -3 in the documentation ?

Note: the released models were trained with --n_hidden 2048 , so you need to use that same value when initializing from the release models. Note as well the use of a negative epoch count -3 (meaning 3 more epochs) since the checkpoint you’re loading from was already trained for several epochs.

This comes from the section ‘Continuing training from a release model’. Maybe I misinterpreted this, I am just trying to understand how this works.

Ok, well, this is just an example of how to perform this task, it’s not an order. You need to try and see what works best for you, it can also depends on your dataset and your requirements.

@lissyx @reuben @kdavis what’s the meaning of using a negative epoch value?

You quoted it yourself in your previous reply

It says -3 is for three more epochs, what if I used +3? what would that mean? wouldn’t that mean three more epochs, as well?

No, it would mean train until the third epoch.

(When you resume from a checkpoint it resumes from the previous epoch count)

okay. so, one last question…if the pretrained model was trained for 30 epochs and now for fine tuning I give epoch= +40, what would that mean?