Isn’t the memory for keeping the weights negligible. Tacotron has about 7M parameters, Taco2 has about 20M. Stored as 4 byte floats, it’s about 30MB and 80MB. For any decent GPU that’s negligible. Also, it’s small compared to the memory for training data.
You don’t only keep weight. You also forward and backward on these. I’d guess it is almost 2x more memory but I might be wrong. (I just skimmed the post)