TLDR: If the recording otherwise correct vote on YES!.
A couple of days ago I asked “…if somebody laughs/snorts/sneezes etc. before/after/during the sentence what should we vote on?” on GitHub because I was genuinely curious, and thanks to Kelly Davis I got an answer:
Good question. In larger academic speech corpora the various non-speech acts (coughs, sneezes, breaths…) are transcribed into the text
Hello < cough > how are you?
and a speech-to-text system is trained to recognize and ignore the non-speech acts. However, we don’t have the ability to transcribe the non-speech acts into our text.
So the questions are:
Is it better to leave the non-speech acts in the audio with the understanding that a speech-to-text system will learn to ignore them as they don’t appear in the transcript?
Is it better to mark audio with non-speech acts as invalid with understanding that a speech-to-text system trained on such data would get confused by the presence of the non-speech acts?
I think with the progress of speech-to-text systems leaving the non-speech acts (coughs, sneezes, breaths…) in the audio is the way to go. Despite the fact that they will not appear in the transcript.