I’m workig on improvements for my custom model. I was wondering how single words in each line in the language model may affect the accuracy. Did anybody try it already?
I have already a large custom german language model with sequences but I would like to append some synonyms to it. Do I have to write for each synonym, some new sentences that contain this word?
Look online for CTC beam search and read the docs for kenlm, that will give you even more answers.
We are constantly talking about the probability of two or more words occurring together. If you have a single word per line, what is the probability of it occurring together with another one? So, no single words, but plausible sentences