Special tags in spontaneous speech mode

Libra · March 10, 2026, 11:13am

Transcribe → General guidance → Labeling noise events like coughing or laughing

How other noises should be labeled, if there are only 4 listed special tags right now:
[laugh], [disfluency], [unclear], [noise]. Does it mean, I should add new special tag as [coughing]?

If yes, then how should I do that. Should I just add this tag as in folksonomy or I need to suggest it here or in other place and wait until it is approved and is implemented (because you have frontend translation for special tags, which won’t be possible with random/user tags). If I can add/define a new tag by myself, can it be in my native language or it should be in English only?

If the answer is no, then how can I label other noises? Does it mean I should modify in some way the [noise] special tag? Something like [noise-coughing]/[noise_coughing]/[noise|coughing] etc?

bozden · March 10, 2026, 11:46am

Believe it or not, when SPS came out as Alpha first time, I asked for “[cough]” tag, as a person who does that a lot.

There is no guidelines for this, but what I can suggest is:

Do not overcomplicate - a dataset user would just skip those part out most of the time.
The [xyz] will be the main detector
I’d use `[cough]` label even if it is not defined
Open an issue as feature request for new tags if needed.

There are some papers in arxiv (and elsewhere) which detail these. We all need to read those for future plans I guess…

Libra · March 11, 2026, 9:24pm

Of course I will not. In >95% cases it is enough for me to have the tags that are already defined. I just want to know how to behave, if one day I will need them. In addition, it doesn’t really have any sense to add too many new tags, if there is no any full list of used tags with their explanation as in the guidelines. So I don’t see any reason to do that right now.

Sorry, but what do you mean by that?

They definitely exist. The main problem as far as I know, is that this taglist always depends on the project goals, which I’m not sure are well-defined yet.

If you start to read or collect them, I think it would be good to have a separate topic here for collecting and sharing articles and papers that are relevant to CommonVoice

bozden · March 11, 2026, 10:02pm

Exactly.

I mean, anything between […] would be regarded as tag (i.e. not representing a real speech, but something else). Important thing here is they are systematic, e.g. “errr”, “ummm” are not (how many “m”s are enough?).

In many cases (e.g. ASR training), these parts will be cut out e.g. during forced alignment. But what if a researcher is working on how people laugh?

I think (except missing cough ) these would be enough for such researchers.

Topic		Replies	Views
Disfluencies Common Voice participation , spontaneous-speech , guidelines	5	81	March 23, 2026
[Community Feedback Request] Introducing Quality Tags for Common Voice Spontaneous Speech - We want to hear from you! Common Voice feedback	1	142	December 17, 2025
Spontaneous Speech Mode is Coming to Common Voice Common Voice	7	557	September 19, 2025
Tagging an audio clip as whisper/voice Common Voice	1	458	October 30, 2019
Release live: MCV Scripted Speech 24.0 and Spontaneous Speech 2.0 Common Voice	1	407	December 16, 2025

Special tags in spontaneous speech mode

Related topics