Combining GST and multi-speaker for adaptation and prosody control

I wonder if anybody has tried something like the title. I am trying now to train using LibriTTS, but I cannot get it to work properly. Is it even worth trying?

Have you improved anymore? I also tried and it did not work as well :frowning: