What is the maxium length of audio from a single speaker?

harveenchadha · April 14, 2020, 10:40am

How much data at maximum is calculated from a single speaker?

nukeador · April 14, 2020, 11:31am

Hello and welcome to the community!

Is it possible to elaborate a bit more your question? If you are asking about if there is a limitation on the number of hours a person can record on the site, no there is no limit.

Having said that, once a person has recorded a decent amount of clips, we should aim for getting other new and diverse voices to enrich the dataset.

Cheers.

harveenchadha · April 14, 2020, 11:41am

Thanks Nukeador.

I want to understand when the model is trained say Deepspeech, is there a limitation on the number of hours a particular speaker can contribute to?

For example: librispeech is one of the dataset on which deepspeech can be trained. So in librispeech, one speaker can contribute a maximum of 30 minutes and the total dataset length is 1000 hours. So I want to understand the same thing, say if one person is contributing 1500 hours out of 10,000 hours, will it be a good thing for the model or a bad thing as the model can overfit on that voice?

Thanks!

nukeador · April 14, 2020, 11:43am

I’m moving this to the Deep Speech category, since we don’t do model trainings on Common Voice (we just collect the data and publish the dataset).

Someone over here would be more informed to answer you question

lissyx · April 14, 2020, 11:51am

The only limitation will come from your GPUs RAM. That’s why current DeepSpeech importers limits around 10-15 secs per audio WAV so you can fit multiple of them per batch.

reuben · April 14, 2020, 12:17pm

From what we can gather from reports of people’s experiments, it can vary a lot from dataset to dataset. The official Common Voice splits remove all duplications and leave only a single sentence per speaker (so way less than the 30 minutes of LibriSpeech), but some people have reported improvements in their models when increasing that limit to some value bigger than one sentence per speaker. Your best bet is to experiment. Although I’d guess that the extreme case of 1500 hours out of 10000 coming out of a single person would not be good

Topic		Replies	Views
Ideal length of training recordings DeepSpeech	4	469	December 30, 2020
DeepSpeech training voice sample duration DeepSpeech	6	802	January 13, 2020
Discussion: Relaxation of the 10 sec recording limitation Common Voice feedback	10	3878	March 16, 2024
Total number of audio files used to train DeepSpeech 0.5.1 DeepSpeech	1	306	November 14, 2019
Why 5s audio? DeepSpeech	4	482	June 26, 2019

What is the maxium length of audio from a single speaker?

Related topics