Error when creating my own scorer file

hi lissyx:
I tried your latest version 0.7.4, but I got below error:
(ds-train-0.7.4) (base) chenyuz@chenyuz-y7000p:~/Desktop/ASR/mozilla/DeepSpeech/data/lm$ python3 generate_lm.py --input_txt vocabulary.txt --output_dir . --top_k 500000 --kenlm_bins /home/chenyuz/Desktop/ASR/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory “85%” --arpa_prune “0|0|1” --binary_a_bits 255 --binary_q_bits 8 --binary_type trie

Converting to lowercase and counting word occurrences …
| | # | 28874 Elapsed Time: 0:00:00

Saving top 500000 words …

Calculating word statistics …
Your text file has 489984 words in total
It has 4173 unique words
Your top-500000 words are 100.0000 percent of all words
Your most common word “的” occurred 16667 times
The least common word in your top-k is “裹” with 1 times
The first word with 2 occurrences is “泱” at place 3954

Creating ARPA file …
=== 1/5 Counting and sorting n-grams ===
Reading /home/chenyuz/Desktop/ASR/mozilla/DeepSpeech/data/lm/lower.txt.gz
----5—10—15—20—25—30—35—40—45—50—55—60—65—70—75—80—85—90—95–100


Traceback (most recent call last):
File “generate_lm.py”, line 210, in
main()
File “generate_lm.py”, line 201, in main
build_lm(args, data_lower, vocab_str)
File “generate_lm.py”, line 97, in build_lm
subprocess.check_call(subargs)
File “/home/chenyuz/anaconda3/lib/python3.7/subprocess.py”, line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command ‘[’/home/chenyuz/Desktop/ASR/kenlm/build/bin/lmplz’, ‘–order’, ‘5’, ‘–temp_prefix’, ‘.’, ‘–memory’, ‘85%’, ‘–text’, ‘./lower.txt.gz’, ‘–arpa’, ‘./lm.arpa’, ‘–prune’, ‘0’, ‘0’, ‘1’]’ died with <Signals.SIGSEGV: 11>.
(ds-train-0.7.4) (base) chenyuz@chenyuz-y7000p:~/Desktop/ASR/mozilla/DeepSpeech/data/lm$

1 Like