I just started reviewing sentences in my mothers tongue, Norwegian (Bokmål). There is an abundance of only one word sentences. Should I just approve all of these or should they be rejected?

I don’t understand the language. The license is OK and it is prepared for ASR, which is also OK. How many are they? It is not ideal to have too many non-sentence single words or utterances appearing one after another.

Please see the following conversations/views: [image] Many single words in data set (UA) - is that OK? Common Voice In my opinion (not official Mozilla policy), single words are fine if they frequently appear in single word sentences. For example interjections “yes”, “no”…

Seems like someone has added sentences from a public domain source with a script or something. Also a bunch of sentences which reads “one seventy-two nine eight four” e.g.

[image] internetman: “one seventy-two nine eight four” If you want recognition of numbers, these are good to add… They would be boring to record/listen if repeated. So they must be mixed. Therefore dumping generated sentences are not promoted. E.g. I lately implemented a script to pre-proce…

The sentences have been dumped from this source: https://www.nb.no/sprakbanken/en/resource-catalogue/oai-nb-no-sbr-54/ . So I guess it’s should be quite optimal. Just sort of wierd that most “sentences” (thus far) are not sentences. Quite exhausting to skip through hundreds of one word sentences …

Sorry for the noise. It seems like it was the beginning of the dataset which were consisting of one word sentences and written number sentences. How many people need to approve each sentence before it gets submitted btw?

More noise; how about sentences which doesn’t really make sense? Like “Don’t forget to eat the dishes.”

Both in Sentence Collector and Common Voice Listen, 2 votes are needed to accept or reject, whichever comes first.

[image] internetman: how about sentences which doesn’t really make sense? Like “Don’t forget to eat the dishes.” I read some conversation about these (at the start of the project) and it was decided not to include them.

"Sentences" with only one word

Common Voice

bozden (Bülent Özden) June 3, 2022, 1:51am 11

E.g. see: Validating meaningless sentences in the Sentence Collector?

Please search the forum to read more.

Topic		Replies	Views
How do I add single word for my language? Common Voice sentence-collection	6	1789	January 16, 2022
Many single words in data set (UA) - is that OK? Common Voice sentence-collection	2	822	July 5, 2021
Single word utterances better than sentence? Common Voice sentence-collection	1	541	August 28, 2020
How unique should a sentence be? Common Voice sentence-collection	7	1084	May 15, 2019
About the new English Sentences Common Voice feedback , issue	37	3344	May 31, 2019

"Sentences" with only one word

Related topics