Combining GST and multi-speaker for adaptation and prosody control

I wonder if anybody has tried something like the title. I am trying now to train using LibriTTS, but I cannot get it to work properly. Is it even worth trying?

1 Like