Hello everyone!
Since I’ve got limited hardware and I want to speed up training as much as I can, I first pretrain tts on short utterances, and then I change the dataset (same speaker) to the one with longer utterances (and smaller batch size).
The thing is, the stopnet got much worse and seems like it’s not going to be improved. I use separate stopnet. Here what plots look like:
Is there any way it can be fixed? Can I somehow increase LR of stopnet or freeze the Tacotron’s weight for a while?
Thank you!!