Confusion for finding best dropout values

Hi,

First, thank you for all your work :slightly_smiling_face:

I’m a bit confused with the dropout parameter, in the 0.9.0 release description it is said that the dropout value used for training is 0.4, but there are 5 more dropout parameters values.

In the release checkpoint flags file, theses values a described :
– dropout_rate=0.05
–dropout_rate2=0.05
–dropout_rate3=0.05
–dropout_rate4=0.0
–dropout_rate5=0.0
–dropout_rate6=0.05

Does it mean that you used :
– dropout_rate=0.4
–dropout_rate2=0.4
–dropout_rate3=0.4
–dropout_rate4=0.0
–dropout_rate5=0.0
–dropout_rate6=0.4

last time, I made a optuna script for dropout parameter optimisation only for the first value (–dropout_rate), but i’m not sure i’m right.

Does it make sense to look for differents dropout values for each layer?

if you have some details to share i would be glad :slightly_smiling_face:

thx

@lissyx, @reuben this is indeed a bit strange. Do you know why the flags.txt in the checkpoint for English lists these values, is this from a continued training with lower learning rate?

Right, the flags.txt file corresponds to the last training step.

Yes, it means just this first parameter.

This is a matter of how much computational budget and time you have/want to spend to potentially better results.

1 Like

hi,

thx for your answers, you are totally right it will be very expensive trying optim on multiples dropout values…

i will just give a try to a different scenario with range values (d0 : 0.4-06, d2=d3=d6 : 0.1-0.3)

i put the link to this publication i found, a survey of dropout techniques for any interested people https://arxiv.org/pdf/1904.13310.pdf

1 Like

It would be great to know if high droputs like 0.6 have a meaningful impact for DeepSpeech models. I have only tried up to .5 and that made our model worse.