Hello, this is still in development, we hope to achieve better results soon. However, this should not achieve results like training a model in a few hours of your voice. The initial idea is to generate data to train ASR systems, so we just want to make changes to the voices. This pre-trained model can be used to adjust your voice with a lot of speaking time, basically take this model and I trained for some steps on some many of your voice. The model is already aware of multiple speakers so it should facilitate training in a specific voice.
However, we hope to improve the model to sound closer to the speaker’s voice. At the moment this model was only trained using the VCTK dataset, which is a dataset with only 109 speakers and which has a limited vocabulary. We are testing several possibilities and after the end of these tests we intend to train the model in LibriTTS, so we should have better results.