How much data at maximum is calculated from a single speaker?
Hello and welcome to the community!
Is it possible to elaborate a bit more your question? If you are asking about if there is a limitation on the number of hours a person can record on the site, no there is no limit.
Having said that, once a person has recorded a decent amount of clips, we should aim for getting other new and diverse voices to enrich the dataset.
Cheers.
Thanks Nukeador.
I want to understand when the model is trained say Deepspeech, is there a limitation on the number of hours a particular speaker can contribute to?
For example: librispeech is one of the dataset on which deepspeech can be trained. So in librispeech, one speaker can contribute a maximum of 30 minutes and the total dataset length is 1000 hours. So I want to understand the same thing, say if one person is contributing 1500 hours out of 10,000 hours, will it be a good thing for the model or a bad thing as the model can overfit on that voice?
Thanks!
I’m moving this to the Deep Speech category, since we don’t do model trainings on Common Voice (we just collect the data and publish the dataset).
Someone over here would be more informed to answer you question
The only limitation will come from your GPUs RAM. That’s why current DeepSpeech importers limits around 10-15 secs per audio WAV so you can fit multiple of them per batch.
From what we can gather from reports of people’s experiments, it can vary a lot from dataset to dataset. The official Common Voice splits remove all duplications and leave only a single sentence per speaker (so way less than the 30 minutes of LibriSpeech), but some people have reported improvements in their models when increasing that limit to some value bigger than one sentence per speaker. Your best bet is to experiment. Although I’d guess that the extreme case of 1500 hours out of 10000 coming out of a single person would not be good