Help needed training with small dataset

pathnirvana · January 30, 2021, 9:17am

Dear All, I am new to tts and deep learning. I have tried to train the tacotron2 with a small (1.5 hours) Sinhala (Sri Lanka) dataset and I think I am seeing overfitting as expected. The results are attached below. I am training with the phenomes. The final voice synthesis seems to be going in the right direction - some letters are understandable - but gibberish

I am working on creating a bigger dataset. Meanwhile is there something I can try to get better results with the data I have?

erogol · January 31, 2021, 11:13pm

things look good to me. How is the final result ?

pathnirvana · February 1, 2021, 5:32am

The final voice synthesis seems to be going in the right direction - some letters are understandable - but gibberish

I also tried the glow-tts training - since the training is 10,000 epochs as opposed to just 1000 in tacotron it seems to be much slower - is there a way to prevent running the evaluation at the end of each epoch?

erogol · February 1, 2021, 10:25am

run_eval in config.json

Pak · February 10, 2021, 12:48pm

I make similar experiment with Taiwanese.
While the vocoder part with only two hours sounds great by self, the tacotron text to mel part is bad. think not enough data.

erogol · February 10, 2021, 1:23pm

if you share more info like your config, tensorboard etc. we can maybe help

pathnirvana · February 10, 2021, 2:44pm

@erogol is there a way to reduce the number of parameters/layer sizes in tacotron so it will not overfit small datasets?

I am making the bigger dataset now for our language, but it takes time.

erogol · February 11, 2021, 1:11pm

yes you need to set layer sizes manually in the source.

Topic		Replies	Views
Fine-Tuning Trained Model to New Dataset TTS (Text-to-Speech)	13	4899	August 22, 2019
Tacotron2: bad test synthesis results TTS (Text-to-Speech)	1	2384	March 1, 2020
Training suddenly dropping in quality TTS (Text-to-Speech)	20	2428	August 18, 2020
My latest results using private dataset trained Tacotron2 model with MelGAN vocoder TTS (Text-to-Speech)	14	2431	July 14, 2020
The trained tacotron2 model is 600 MB, however in the notebook its around 300MB TTS (Text-to-Speech)	2	646	February 9, 2021

Help needed training with small dataset

Related topics