bozden
(Bülent Özden)
June 3, 2022, 1:21am
2
Please see the following conversations/views:
In my opinion (not official Mozilla policy), single words are fine if they frequently appear in single word sentences. For example interjections “yes”, “no”, question words, “when?”, “where?”, certain verb forms in languages without obligatory subject pronouns “voy”, “пойду”, “dámelo”. You should not include anything that is a non-sentence, but plenty of valid sentences can appear as a single word. If the idea is to create systems to recognise speech, the idea is that the sentences should look …
I beg to differ.
I think the main reason people tend to include 5-10 word sentences in the past was that the Language Model in Deepspeech (and in Coqui) is using 5-gram models.
As I mentioned above, many utterances in our everyday conversations include less than 5 words, which mostly include single words. For example, if you are commanding a machine, if you are asking a specific thing and getting answers…
There is nothing wrong with single words. Yes, you won’t dump the vocabulary, but anythi…
Although not official CV policy, we think they are perfectly fine. As long as they are conversational, and not a dump of whole dictionary.
These words are limited, say in thousands, and the corpus will grow to millions. In the long run they will disappear. As I pointed out, it is best to mix them with longer ones.
1 Like