Hi i couldn’t find anything about DS_ERR_SCORER_NO_TRIE. It’s happens running with generate_package.py.
Any ideas ?
As in the source code it says
DS_ERR_SCORER_NO_TRIE, 0x2007, "Reached end of scorer file before loading vocabulary trie.
``` ╰─ python3 generate_lm.py --input_txt ../test_final.txt --output_dir . --top_k 500000 --kenlm_bins /home/jyri/ieud/projects/deepspeech/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie Converting to lowercase and counting word occurrences ... | | # | 10519777 Elapsed Time: 0:01:46 Saving top 500000 words ... Calculating word statistics ... Your text file has 64277395 words in total It has 1561629 unique words Your top-500000 words are 97.3967 percent of all words Your most common word "ve" occurred 2110118 times The least common word in your top-k is "paonun" with 3 times The first word with 4 occurrences is "sallanıyordur" at place 483963 Creating ARPA file ... === 1/5 Counting and sorting n-grams === Reading /home/jyri/Desktop/deepspeech/data/lm/lower.txt.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Unigram tokens 64227371 types 1599291 === 2/5 Calculating and sorting adjusted counts === Chain sizes: 1:19191492 2:1390338944 3:2606885888 4:4171016960 5:6082733568 Statistics: 1 1599291 D1=0.555044 D2=1.38028 D3+=1.74191 2 19200762 D1=0.613093 D2=1.57666 D3+=1.85088 3 4887608/42923262 D1=0.890855 D2=1.26068 D3+=1.3821 4 2292531/46125983 D1=0.95244 D2=1.41186 D3+=1.46762 5 1075956/40097829 D1=0.960827 D2=1.53103 D3+=1.51773 Memory estimate for binary LM: type MB probing 661 assuming -p 1.5 probing 818 assuming -r models -p 1.5 trie 378 without quantization trie 227 assuming -q 8 -b 8 quantization trie 323 assuming -a 22 array pointer compression trie 173 assuming -a 22 -q 8 -b 8 array pointer compression and quantization === 3/5 Calculating and sorting initial probabilities === Chain sizes: 1:19191492 2:307212192 3:97752160 4:55020744 5:30126768 ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **################################################################################################## === 4/5 Calculating and writing order-interpolated probabilities === Chain sizes: 1:19191492 2:307212192 3:97752160 4:55020744 5:30126768 ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 #################################################################################################### === 5/5 Writing ARPA model === ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Name:lmplz VmPeak:14132308 kB VmRSS:112920 kB RSSMax:3961724 kB user:73.651 sys:13.0078 CPU:86.6589 real:75.457 Filtering ARPA file using vocabulary of top-k words ... Reading ./lm.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Building lm.binary ... Reading ./lm_filtered.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Identifying n-grams omitted by SRI ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Quantizing ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Writing trie ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** SUCCESS ```
python3.6 generate_package.py --alphabet ../alphabet.txt --lm lm.binary --vocab vocab-500000.txt --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284
500000 unique words read from vocabulary file.
Doesn't look like a character based model.
Using detected UTF-8 mode: False
Traceback (most recent call last):
File "generate_package.py", line 153, in <module>
main()
File "generate_package.py", line 148, in main
args.default_beta,
File "generate_package.py", line 58, in create_bundle
if err != ds_ctcdecoder.DS_ERR_SCORER_NO_TRIE:
AttributeError: module 'ds_ctcdecoder' has no attribute 'DS_ERR_SCORER_NO_TRIE'