@othiele Thanks
python generate_lm.py --input_txt Vocabulary20052020.txt --output_dir . \
--top_k 500000 --kenlm_bins /home/ec2-user/LM/kenlm/build/bin/ \
--arpa_order 3 --max_arpa_memory "25%" --arpa_prune "1" \
--binary_a_bits 255 --binary_q_bits 8 --binary_type trie
this commands successfully run
vocab-500000.txt
was created and lm.binary
file
Then,
python generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt \
--package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
After that i got below results
334 unique words read from a vocabulary file.
Doesn't look like a character-based model.
Using detected UTF-8 mode: False
Package created in kenlm.scorer
Thanks @othiele for help