Default dropout rate 0.05 vs 0.2 used for training

(Yv) #1

If I read this code correctly,
the default dropout rate for the training(if not redefined from commandline) is 0.05 whereas the documentation for 0.3.0 release states that 0.2 dropout rate was used.

Why that difference? Wouldn’t it be more suitable to use a default value closer to the one used for releases?

(Murugan R) #2

@yv001 sir, i think actually in normal training deepspeech did 0.05 dropout rate. if you go for continue training you increase 2042 hidden units. so in this please we optionally go for 0.20 (0.12 recommoded for me) dropout rate. :slightly_smiling_face:

(Yv) #3

not sure what you mean here, parameters documented in releases state that dropout_rate used was 0.2. In code, the default seems to be set to 0.05.

Changing the number of hidden units for fine tuning does not sound like a good idea to me - AFAIK the model architecture should remain the same .

(Murugan R) #4

Hyperparameters for fine-tuning

The hyperparameters used to train the model are useful for fine tuning

dropout_rate 0.2

(Yv) #5

yeah, my point exactly, i’m trying to find out where the default 0.05 dropout rate came from - if there’s a basis for using that as a default or it was a random pick since the released models are trained with 0.2 dropout rate

perhaps someone from mozilla team could share the answer for that?

(Reuben Morais) #6

The hyperparameters used for the release models were found by experimentation, they don’t necessarily agree with the default values in There’s no value that will work best in every scenario, you have to experiment with your data. 0.2 worked best with our release model setup (datasets, batch size, learning rate, model size).

(Yv) #7

ok, thanks

it was a bit confusing for me to find out that the values are different when running the finetuning command described in