Language Model containing single words in each line

buxbaum · September 28, 2020, 1:54pm

Hello,

I’m workig on improvements for my custom model. I was wondering how single words in each line in the language model may affect the accuracy. Did anybody try it already?
I have already a large custom german language model with sequences but I would like to append some synonyms to it. Do I have to write for each synonym, some new sentences that contain this word?

Thank you for help in advance!

othiele · September 28, 2020, 2:28pm

Look online for CTC beam search and read the docs for kenlm, that will give you even more answers.

We are constantly talking about the probability of two or more words occurring together. If you have a single word per line, what is the probability of it occurring together with another one? So, no single words, but plausible sentences

buxbaum · September 28, 2020, 2:40pm

thank you @othiele !