Many single words in data set (UA) - is that OK?

Hello!

I see many single words in sentence collector and actual data set for voice recording in Ukrainian.

I do not see in rules that single words are allowed or only sentences should be used, only lenght <14 words. So what to do with single words in sentence collection and common voice?

Also part of words are names of Cities / places etc. Is that good or not?

1 Like

Hey,

Thanks for your question.

Please could you disapprove the single words when you see them via the sentence collector.

For more details on the reviewing rules check out this page: https://commonvoice.mozilla.org/sentence-collector/#/review

Thanks in advance,

In my opinion (not official Mozilla policy), single words are fine if they frequently appear in single word sentences. For example interjections “yes”, “no”, question words, “when?”, “where?”, certain verb forms in languages without obligatory subject pronouns “voy”, “пойду”, “dámelo”. You should not include anything that is a non-sentence, but plenty of valid sentences can appear as a single word. If the idea is to create systems to recognise speech, the idea is that the sentences should look as much like speech as possible.

In terms of names of cities, they can be helpful to include, especially if they have a non-standard pronunciation, but better they be in context.

3 Likes