Dear All, I am new to tts and deep learning. I have tried to train the tacotron2 with a small (1.5 hours) Sinhala (Sri Lanka) dataset and I think I am seeing overfitting as expected. The results are attached below. I am training with the phenomes. The final voice synthesis seems to be going in the right direction - some letters are understandable - but gibberish
I am working on creating a bigger dataset. Meanwhile is there something I can try to get better results with the data I have?