I tried again using the following commands:
kenlm/build/bin/lmplz --order 2 --text vocabulary.txt --arpa words.arpa
after generating the file words.arpa
, I ran:
kenlm/build/bin/build_binary words.arpa lm.binary
I needed to remove the tag -v trie
because the error persisted. Thus, I managed to generate the file lm.binary
successfully. However, when I tried to generate the kenlm.score
file using the command:
python3 generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocabulary.txt --package kenlm.scorer --default_alpha 0.75 --default_beta 1.18
I got the following error:
12281 unique words read from vocabulary file.
Doesn’t look like a character based model.
Using detected UTF-8 mode: False
Error: Can’t parse scorer file, invalid header. Try updating your scorer file.
Package created in kenlm.scorer
I’ve tried several ways to generate the lm.binary
file, but whenever I can, for some reason I can’t generate the kenlm.score file.
Thanks for your help