I’m migrating from 0.6 to 0.7 following the docs. I have generated a new scorer with
python3 generate_package.py --alphabet ../my/my-alphabets.txt --lm ../my/lm.binary --vocab ../my/my-vocab.txt --package ../my/my_lm.scorer --default_alpha 1.5 --default_beta 1.85
Unfortunately, when trying to start training it fails with:
ValueError: Scorer initialization failed with error code 1
Investigating the failing scorer file, head command shows a git-lfs 3-lines header followed by binary chunks:
version https://git-lfs.github.com/spec/v1
oid sha256:94dc681c40e7731a82e9fbd7f6..d943a1d0411
size 1581036
EIRT▒?▒▒▒▒?▒~consstandard▒Z▒▒ ...
While showing the head of the default kenlm.scorer shows:
mmap lm http://kheafield.com/code format version 5
▒?▒▒▒▒▒▒?#▒4Ɯ, ..
How can this bad header be generated? How can I fix that?