I am trying to generate a scorer with order 8. Here are the steps I followed:
- Compiled kenlm in my system to support upto max order 10.
- Ran the
generate_lm.py
script to createlm.binary
. The script ran without any errors. Exact command used:
python3 generate_lm.py --input_txt corpus.txt --output_dir . --top_k 1000000 --kenlm_bins /path/to/kenlm/build/bin/ --arpa_order 8 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
- Ran the
generate_scorer_package
script to create scorer. Exact command used:
./generate_scorer_package --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-1000000.txt --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
However, when running the generate_scorer_package
, I get the following error:
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector<long unsigned int>&) threw FormatLoadException because `counts.size() > 6'.
This model has order 8 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake:
cmake -DKENLM_MAX_ORDER=10 ..
With Moses:
bjam --max-kenlm-order=10 -a
Otherwise, edit lm/max_order.hh.
I have already compiled kenlm to support upto 10 order but why is it still throwing this error?
P.S. This is DeepSpeech v0.8.2 and when I try to generate scorer with order 5, the same scripts work without any issues.