Wondering what should you do when you hear a recording with caugh/sneeze etc?

Kusaha · August 1, 2017, 10:46pm

TLDR: If the recording otherwise correct vote on YES!.

A couple of days ago I asked “…if somebody laughs/snorts/sneezes etc. before/after/during the sentence what should we vote on?” on GitHub because I was genuinely curious, and thanks to Kelly Davis I got an answer:

Good question. In larger academic speech corpora the various non-speech acts (coughs, sneezes, breaths…) are transcribed into the text
Hello < cough > how are you?
and a speech-to-text system is trained to recognize and ignore the non-speech acts. However, we don’t have the ability to transcribe the non-speech acts into our text.
So the questions are:
Is it better to leave the non-speech acts in the audio with the understanding that a speech-to-text system will learn to ignore them as they don’t appear in the transcript?
or
Is it better to mark audio with non-speech acts as invalid with understanding that a speech-to-text system trained on such data would get confused by the presence of the non-speech acts?
I think with the progress of speech-to-text systems leaving the non-speech acts (coughs, sneezes, breaths…) in the audio is the way to go. Despite the fact that they will not appear in the transcript.

Thank you!

mhenretty · August 2, 2017, 9:09am

Thanks for sharing this here @Kusaha!

Topic		Replies	Views
Um's and A's - how do we handle speech disfluency? DeepSpeech	3	393	June 13, 2020
Conversational speeches as training data DeepSpeech	1	582	August 13, 2018
Heteronyms of homographs = No Common Voice	2	598	January 27, 2020
What are the rules for what constitutes accuracy? Common Voice	2	586	July 8, 2018
Important questions: punctuation, time available to read, errors on written sentences Common Voice feedback	5	1102	October 6, 2018

Wondering what should you do when you hear a recording with caugh/sneeze etc?

TLDR: If the recording otherwise correct vote on YES!.

Related topics