Many single words in data set (UA) - is that OK?

In my opinion (not official Mozilla policy), single words are fine if they frequently appear in single word sentences. For example interjections “yes”, “no”, question words, “when?”, “where?”, certain verb forms in languages without obligatory subject pronouns “voy”, “пойду”, “dámelo”. You should not include anything that is a non-sentence, but plenty of valid sentences can appear as a single word. If the idea is to create systems to recognise speech, the idea is that the sentences should look as much like speech as possible.

In terms of names of cities, they can be helpful to include, especially if they have a non-standard pronunciation, but better they be in context.

3 Likes