Tacotron 2 & FP16

vcjobacc · December 6, 2020, 11:45am

Hello everyone!
Since I’ve got an RTX card, I can potentially do fp16 training with amp, right?
The question is, has anyone tried it already? Just wanna know if it’s worth to try. Can I start training with fp16 and, let’s say, on 100k steps get back to fp32?
Thank you!

dkreutz · December 6, 2020, 3:15pm

Community member @repodiac did some experiments and contributed a Pull-Request for AMP O1 optimization.

erogol · December 7, 2020, 9:47am

most of the training code already supports f16 training. You just need to set apex_amp_level="O1" or mixed_precision=true in config file depending on the training script using apex or pytorch

vcjobacc · December 11, 2020, 8:30pm

@erogol Thank you!
What’s your own experience with mixed_precision=true?
I have multiple tries, one of them finished with “Zero loss & Inf gradients” all of a sudden after like 16th epoch, and the other for some reasons dropped the quality suddenly with the same number of frames per second (see pic.). Has it something to deal with AMP? Thank you!

synesthesiam · December 12, 2020, 7:26pm

I get the same thing with CUDA 11 and a 2080 Ti. Tried PyTorch 1.6 and 1.7.

erogol · December 12, 2020, 10:42pm

they release 1.7.1 mentioning some perf issues in 1.7

Maybe useing the lates pytorch version moght solve. otherwise it is finding the right LR that works .