Multiple speakers within one record for training

vcjobacc · September 11, 2019, 7:27am

Hello everyone!
Just wanted to find out how bad would it be to use records which have more than one speaker talking.
Let me explain a bit more. I extract speech data from youtube based on manual subtitles provided. To collect as much data as possible within a short time period I perform almost no post processing. The music, noise and other kind of acoustic effects are kept - I guess and hope it will lead to more robust model. Am I wrong?
And as long as subtitles has no info regarding the speakers (who spoke when) and I leave it as it is, quite often there are multiple people’s speech being presented within a single record. How bad it is (if is)? After all I’m gonna add this data to a clean dataset of 300 Hours (with single speaker per single record).
Thank you all for the suggestions!

tanmayjain · December 4, 2020, 10:59am

Hi,
Sorry for bump last year’s topic.
Did you try this?
Did multiple speakers in a single chunk of audio effects the result?

Thanks

othiele · December 4, 2020, 12:50pm

We didn’t have any problems with multiple speakers in one chunk as long as they don’t speak at the same time.

Topic		Replies	Views
Two speakers in a single audio DeepSpeech	0	400	January 9, 2020
Should one extend DS with youtube speech+subtitles? DeepSpeech	0	394	September 5, 2019
Training a Small Dataset on DeepSpeech DeepSpeech	1	889	March 1, 2023
New to the TTS field and i have some questions (about the necessary data) TTS (Text-to-Speech) learning	3	833	February 12, 2021
Lesser illegible outputs by decreasing the size of the context window (?) DeepSpeech	0	450	April 28, 2018

Multiple speakers within one record for training

Related topics