Tacotron 2 with DDC and GST trained on LJSpeech: samples become too loud as training progresses

Dzmitry_Pletnikau · May 13, 2021, 1:20am

I am training Tacotron 2 with DDC and GST on LJSpeech dataset.

I’ve used the same exact config with GST turned off and got a sufficiently good TTS model. Now I am re-training with GST turned on. That’s the only difference between configs.

Unexpectedly, somewhere between 40k-step and 59k-step the generated samples started to sound too loud. I am now on 89k-step and the problem persists.

See the progression of samples: https://soundcloud.com/dzmitry-pletnikau/sets/ljspeech-tacotron2-ddc-gst-samples

I wonder what may be wrong or how I can debug to find out what is wrong. Any help will be greatly appreciated.

Dzmitry_Pletnikau · June 12, 2021, 10:56pm

This was caused by eval-set being auto-generated using zero GST vector, which apparently corresponded to a style which distorts the voice a lot.