I was working with a TTS version I cloned about a year ago and was very impressed by the quality out-of-the-box. I was interested in testing out the latest version with multi-speaker and, after trying to do some controls on LJSpeech 1.1, the final samples from validation after 1k epochs sound much worse. I’ve tried matching the few different config settings (e.g. learning rate) but i am still getting pretty poor quality speech on all of LJSpeech over 1000 epochs.
I am trying to play with the attention settings, but has anyone gone through this? I don’t want to try LibriTTS until I am sure that I’ve got things working sensibly.