Radam optimizer

geneing · September 10, 2019, 6:38pm

I see that the dev branch is now using RAdam optimizer.

FYI, there is a newer even better optimizer called Ranger: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d

It combines Radam with new ideas from Hinton’s paper. I don’t have a personal experience with it, but the paper looks very reasonable.

erogol · September 18, 2019, 2:18pm

Thx for pointing. However, it needs 2 copies of the model weights which would use more GPU memory. I guess, it is better for now to keep it with RADAM

geneing · September 18, 2019, 6:19pm

Isn’t the memory for keeping the weights negligible. Tacotron has about 7M parameters, Taco2 has about 20M. Stored as 4 byte floats, it’s about 30MB and 80MB. For any decent GPU that’s negligible. Also, it’s small compared to the memory for training data.

erogol · September 18, 2019, 8:00pm

You don’t only keep weight. You also forward and backward on these. I’d guess it is almost 2x more memory but I might be wrong. (I just skimmed the post)