I see that the dev branch is now using RAdam optimizer.
FYI, there is a newer even better optimizer called Ranger: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
It combines Radam with new ideas from Hinton’s paper. I don’t have a personal experience with it, but the paper looks very reasonable.