I planned to use the multispeaker tacotron 2 model https://github.com/mozilla/TTS/wiki/Released-Models for my use but the quality was not up to the mark. so i decided to finetune it my use but i have been struggling to do so. Can anyone guide me how to finetune it with VCTK dataset. My use case is real time speech. It would be a great help if i can get a tutorial notbook to do so.
What settings did you use in your config.json file? (can you share what you used?)
And have you looked at the notebooks that handle multispeaker? I’ve yet to do much work with multispeaker myself but those should be a good start and because the code’s in front of you, you can follow it easily.