Recommended hyperparameters range for fine tuning

Hi,

When I continue training with the pre-trained English model 0.1.1 to add Indian accent, I can tune hyperparameters as needed.

As @kdavis said, the logic behind the hyperparameters is mostly hard-won trial and error. But I would like to know, from your experience, if there are any recommended range for each hyperparameter. In which order should I tune each hyperparameter ?

Hyperparameters from the pre-trained model:
train_batch_size 12
dev_batch_size 8
test_batch_size 8
epoch 13
dropout_rate 0.2367
default_stddev 0.046875

I guess the following hyperparameters doesn’t need to be change:
learning_rate 0.0001
n_hidden 2048
validation_step 1

Thanks,

Tan

If you don’t plan on changing the network architecture, then you should not touch default_stddev.

Dependent upon the memory available on your GPU(s) you can change train_batch_size, dev_batch_size, and test_batch_size to not get out of memory problems.

As to ranges, unfortunately, it depends.

For example depending on how “noisy” your data is you should tune dropout_rate up or down. The more “noisy” the lower dropout_rate should be as the data regulates the network itself. I don’t know what the maximal value of dropout_rate should be for “clean” data and I don’t know what the minimum should be for “noisy” data.

As to order, I can only tell you what I did. First, I got n_hidden from the Baidu paper, then got default_stddev from xavier initialization, then set train_batch_size + dev_batch_size + test_batch_size by finding the max value that didn’t get OOM errors on our GPU(s), then set learning_rate through binary search, then set epoch buy seeing when early stopping jumped in for our data sets.

1 Like

Thanks for your answer.

Where is dropout_rate stands in your order ?
To speed up the fine tuning process, are you training on a small percentage of the whole dataset ? If yes, what are your percentage and how you select it ?

Do you fine tune hyperparameters manually or you have an automated hyperparameter tuning ?

Tan

Oh yeah dropout_rate was set before epoch and after learning_rate.

I trained on, as far as I remember, 10k audio clips to fine tune. This number seemed to be big enough to reflect the behavior of the real data set, but small enough to work with.

I just happened upon 10k, but in retrospect my guess is that’s an approximation of the 16.5k you’d need to get a 99% confidence level with a 1% confidence interval on a population of 2.5 million[1], which is about how many audio clips we train on.

I tuned manually as we, at the time, didn’t have the hardware required to do many runs.

If you end up using automated hyperparameter tuning, I’d love to hear about your setup, as I’d guess that we can improve our performance some.