Dear All, I am new to tts and deep learning. I have tried to train the tacotron2 with a small (1.5 hours) Sinhala (Sri Lanka) dataset and I think I am seeing overfitting as expected. The results are attached below. I am training with the phenomes. The final voice synthesis seems to be going in the right direction - some letters are understandable - but gibberish
I am working on creating a bigger dataset. Meanwhile is there something I can try to get better results with the data I have?
The final voice synthesis seems to be going in the right direction - some letters are understandable - but gibberish
I also tried the glow-tts training - since the training is 10,000 epochs as opposed to just 1000 in tacotron it seems to be much slower - is there a way to prevent running the evaluation at the end of each epoch?
I make similar experiment with Taiwanese.
While the vocoder part with only two hours sounds great by self, the tacotron text to mel part is bad. think not enough data.