Regarding the AnalyzeDataset notebook

I tried AnalyzeDataset.ipynb.

Is there a way to conclude exactly what to do to improve the dataset or at least interpret the graphs properly?
Could this be modified to even be presented a text file or a set of audio files only and it would choose the parts it needs to improve the model?

Best

Is that a good dataset to begin with?
Diagramm

With such a high number on short audio clips the model might have problems when infering longer phrases. Nevertheless you can try training a DDC/DCA model, you should see if it works well after approx. 20k steps.

For the thorsten-de voice we had success with a more gaussian/bell shaped distribution: https://github.com/thorstenMueller/deep-learning-german-tts/#dataset-information-microphone (please note that the graph were generated with an older version of the dataset-analysis notebook).

1 Like