The datasets are useless

Hey Bernardo,

Thanks for joining the Common Voice discourse and for your feedback regarding the dataset.

I wanted to highlight that with Common Voice our validation guidelines encourage voice recordings to be done in real-world environments, so TTS can be trained to understand how real people speak - but also within boundaries to support the vitality of the dataset.

Is it possible to explain more about the noise and more details regarding the impact on model overfitting you experienced ?

3 Likes