I am training Tacotron 2 with DDC and GST on LJSpeech dataset.
I’ve used the same exact config with GST turned off and got a sufficiently good TTS model. Now I am re-training with GST turned on. That’s the only difference between configs.
Unexpectedly, somewhere between 40k-step and 59k-step the generated samples started to sound too loud. I am now on 89k-step and the problem persists.
See the progression of samples: https://soundcloud.com/dzmitry-pletnikau/sets/ljspeech-tacotron2-ddc-gst-samples
I wonder what may be wrong or how I can debug to find out what is wrong. Any help will be greatly appreciated.