CVSS. Is offensive lexicon strictly forbidden or it only should not be addresed to anyone?

Libra · January 30, 2026, 9:00pm

I want to increase diversity of language situations/words covered by dataset. Not only something specific.

Yeah, you need specific solutions for specific situations, but always there are some common solutions and I thought about CV as one of them. Isn’t it a goal of Common Voice to collect all types of spontaneous speech to made dataset for ASR models, that can convert all possible speech to text? If not, then what are your specific goals? Which cases do you try to cover by collecting this dataset?

In this specific case I mentioned, I meant that when model which is trained on CV will be able to transcript all possible words, you can easily to add/train other model that will analyze your transcript based on your needs such as finding offensive words, off topic answer etc.

Finally I see at least something about domains, thanks… Why is so hard to find some concrete information about them? On page for adding sentences I see only their list, but not an explanation what are they used for and in guidelines they just are mentioned as “theme of sentense” without any explanation of this “feature”. But I think that my misunderstanding of that is too huge for this answer. I will open a separate theme for it

Topic		Replies	Views
Bad words words list for your languages Common Voice	13	7833	October 23, 2019
"Offensive" language Common Voice	3	1794	October 23, 2017
Common Voice for Healthcare (Edge Cases) Common Voice	6	584	August 26, 2024
Issues in the Romanian dataset Common Voice sentence-collection , feedback , issue	7	330	February 28, 2025
Vulgarities common speech Common Voice sentence-collection , feedback	4	1408	January 22, 2020

CVSS. Is offensive lexicon strictly forbidden or it only should not be addresed to anyone?

Related topics