I want to generate a language model for only ~30 words
Is it ok to do if I want my model to only understand these words?
Second, when I follow 0.8.2 model building guide with this file in the end I get empty output for a while and then Segmentation fault: 1. I use a streaming model with custom scorer, if use pretrained scorer everything works fine.
Here is how I generate scorer:
python generate_lm.py --input_txt lm.txt --output_dir lm --top_k 30 --kenlm_bins /Users/kirill/Developer/kenlm/build/bin --arpa_order 2 --max_arpa_memory "85%" --arpa_prune "0" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie --discount_fallback
./native_client.amd64.cpu.osx/generate_scorer_package --alphabet ../lm/alphabet.txt --lm lm/lm.binary --vocab lm/vocab-40.txt --package /Users/kirill/Developer/kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284