Regarding the AnalyzeDataset notebook

Ole_Klett · August 11, 2021, 5:01pm

I tried AnalyzeDataset.ipynb.

Is there a way to conclude exactly what to do to improve the dataset or at least interpret the graphs properly?
Could this be modified to even be presented a text file or a set of audio files only and it would choose the parts it needs to improve the model?

Best

Ole_Klett · August 13, 2021, 12:48pm

Is that a good dataset to begin with?

dkreutz · August 16, 2021, 2:23pm

With such a high number on short audio clips the model might have problems when infering longer phrases. Nevertheless you can try training a DDC/DCA model, you should see if it works well after approx. 20k steps.

For the thorsten-de voice we had success with a more gaussian/bell shaped distribution: https://github.com/thorstenMueller/deep-learning-german-tts/#dataset-information-microphone (please note that the graph were generated with an older version of the dataset-analysis notebook).

Topic		Replies	Views
Data and training considerations to improve voice naturalness TTS (Text-to-Speech)	32	4398	November 11, 2019
Training suddenly dropping in quality TTS (Text-to-Speech)	20	2479	August 18, 2020
Eval Audio, Test audio, Train Audio on Tensorboard TTS (Text-to-Speech)	8	927	May 31, 2019
[Private dataset - Portuguese] Expecting healthier results at 10k+ steps TTS (Text-to-Speech)	13	914	May 8, 2020
Audio generated with TTS is a Bip TTS (Text-to-Speech) learning	4	2130	March 10, 2021

Regarding the AnalyzeDataset notebook

Related topics