Default dropout rate 0.05 vs 0.2 used for training

yv001 · December 28, 2018, 9:09pm

If I read this code correctly, https://github.com/mozilla/DeepSpeech/blob/ce551f53858b71861e6ae02f4b9778a294f68ef5/util/flags.py#L43
the default dropout rate for the training(if not redefined from commandline) is 0.05 whereas the documentation for 0.3.0 release states that 0.2 dropout rate was used.

Why that difference? Wouldn’t it be more suitable to use a default value closer to the one used for releases?

muruganrajenthirean · December 31, 2018, 6:40am

@yv001 sir, i think actually in normal training deepspeech did 0.05 dropout rate. if you go for continue training you increase 2042 hidden units. so in this please we optionally go for 0.20 (0.12 recommoded for me) dropout rate.

yv001 · December 31, 2018, 9:52am

not sure what you mean here, parameters documented in releases state that https://github.com/mozilla/DeepSpeech/releases dropout_rate used was 0.2. In code, the default seems to be set to 0.05.

Changing the number of hidden units for fine tuning does not sound like a good idea to me - AFAIK the model architecture should remain the same .

muruganrajenthirean · December 31, 2018, 10:06am

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning

dropout_rate 0.2

yv001 · January 2, 2019, 3:56pm

yeah, my point exactly, i’m trying to find out where the default 0.05 dropout rate came from - if there’s a basis for using that as a default or it was a random pick since the released models are trained with 0.2 dropout rate

perhaps someone from mozilla team could share the answer for that?

reuben · January 2, 2019, 4:05pm

The hyperparameters used for the release models were found by experimentation, they don’t necessarily agree with the default values in flags.py. There’s no value that will work best in every scenario, you have to experiment with your data. 0.2 worked best with our release model setup (datasets, batch size, learning rate, model size).

yv001 · January 2, 2019, 4:50pm

ok, thanks

it was a bit confusing for me to find out that the values are different when running the finetuning command described in https://github.com/mozilla/DeepSpeech#continuing-training-from-a-release-model

Topic		Replies	Views
Confusion for finding best dropout values DeepSpeech	4	315	November 9, 2020
Recommended hyperparameters range for fine tuning DeepSpeech	3	1670	July 11, 2018
Training Loss vs Test Loss DeepSpeech	8	2575	August 26, 2019
Information on pre-trained Models DeepSpeech	13	2138	October 31, 2018
Evaluate the accuracy and the loss of the model DeepSpeech	3	1375	August 7, 2019

Default dropout rate 0.05 vs 0.2 used for training

Hyperparameters for fine-tuning

Related topics