Error while compiling generate_trie.cpp

I was trying to generate trie for new binary file i have generated from added vocabulary.
When I am trying to compile its giving error.

/native_client# g++ generate_trie.cpp 
In file included from generate_trie.cpp:5:0:
ctcdecode/scorer.h:9:10: fatal error: lm/enumerate_vocab.hh: No such file or directory
 #include "lm/enumerate_vocab.hh"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Please help, thanks in advance!

Please, read the documentation, you are not building it correctly.

1 Like

generate_trie is distributed as part of native_client.tar.xz, you should not need to rebuild it.

2 Likes

Hi @lissyx , I have generated my binaries with added vocabularies. and also generated the trie. but during training with the new trie and new binaries its showing segmentation fault (core dump) ?

Please share more logs on the crash, ensure you are running ds_ctcdecoder version matching deepspeech training code and python version, and cross-check with released lm and trie file to verify it is not yours.

Use standard file APIs to delete files with this prefix.
Epoch 0 |   Training | Elapsed Time: 3:06:01 | Steps: 569 | Loss: 27.668679                                                                                                        
Epoch 0 | Validation | Elapsed Time: 0:01:12 | Steps: 32 | Loss: 57.548914 | Dataset: /home/rbeigcn1134841d/Downloads/indian_dev.csv                                               
I Saved new best validating model with loss 57.548914 to: /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060
I FINISHED optimization in 3:07:15.253444
INFO:tensorflow:Restoring parameters from /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060
I0204 16:53:11.407811 140016252487488 saver.py:1284] Restoring parameters from /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060
I Restored variables from best validation checkpoint at /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060, step 236060
Testing model on /home/rbeigcn1134841d/Downloads/indian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00                                                                                                                                      Fatal Python error: Fatal Python error: Segmentation faultSegmentation fault

Thread 0x

00007f5735fff700 (most recent call first):
  File Segmentation fault (core dumped)

Right, so it’s test step, it’s likely to be mismatched ds_ctcdecoder. Please reinstall / upgrade it: pip install --upgrade $(python util/taskcluster.py --decoder).

1 Like

everything is working fine with given default configs.(i.e.training, prediction)… It only breaks when i update vocabulary.txt

So verify / share how you build it. Please look at data/lm, it should be self-contained and drive you to a working LM.

steps i did to generate LM:=

  1. get alphabet.txt and add custom words.
../kenlm/build/bin/lmplz --discount_fallback -o 3 <mirrorfit.txt> mirrorfit.arpa
=== 1/5 Counting and sorting n-grams ===
Reading /home/rbeigcn1134841d/Desktop/mark1/mfit-models/mirrorfit.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 200003 types 200006
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:2400072 2:9327010816 3:17488144384
Substituting fallback discounts for order 0: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 1: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 2: D1=0.5 D2=1 D3+=1.5
Statistics:
1 200006 D1=0.5 D2=1 D3+=1.5
2 400006 D1=0.5 D2=1 D3+=1.5
3 200003 D1=0.5 D2=1 D3+=1.5
Memory estimate for binary LM:
type       kB
probing 17969 assuming -p 1.5
probing 21094 assuming -r models -p 1.5
trie    10718 without quantization
trie     7864 assuming -q 8 -b 8 quantization 
trie    10132 assuming -a 22 array pointer compression
trie     7278 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz	VmPeak:26364176 kB	VmRSS:22528 kB	RSSMax:6075308 kB	user:0.858606	sys:1.26977	CPU:2.12839	real:2.06948
../kenlm/build/bin/build_binary -T -s mirrorfit.arpa mirrorfit.binary
Reading mirrorfit.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS
  1. …/DeepSpeech/generate_trie alphabet.txt mirrorfit.binary trie
    Is the flow and outputs correct ??

@lissyx please help, really struggling with this error for last few days. Thanks in advance

It does not looks like you are passing the correct arguments to build_binary.

Please reply and provide the feedback on the other items I asked you to check.

I was following this tutorial " How I trained a specific french model to control my robot

Deep Speech"

creating binary file :

/bin/bin/./build_binary -T -s words.arpa  lm.binary

He was building it in the same way, please tell me otherwise what params to pass.

Also ,
If i use lm.binary in default package and my trie its giving core-dump. But if I use my binaries and trie given in the default package its working. Not sure why?

@lissyx plz give advice

Because you keep insisting on :

  • not listening to what I am telling you
  • refuse to give us feedback on updating the ds_ctcdecoder package
  • don’t use proper documentation that I already linked you to.

I will stop helping you until you actually read and act on what I aksed earlier.

I am really sorry, forgot to inform i did ran
pip install --upgrade $(python util/taskcluster.py --decoder)
but issue still persists.

I am continuously referring data/lm as well the

TUTORIAL : How I trained a specific french model to control my robot

to generate the language model. May be i am missing some small things, just not able to get that.

Wait, can we avoid confusion and get the whole picture ? It’s completely unclear what you are doing now.

Can you cross-check and share pip list | grep ds_ctcdecoder as well as git describe --tags ?

Do you have the crash with the default language model / trie ? Since you failed to share proper status at first, I assumed you had a mismatch …

Please, read doc and script. Don’t refer to anything else.

pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1-35-g94882fb

Yes default lm.binary and trie are working perfectly fine

Ok will check the generate_lm script and see the docs.

Weird. If you are on v0.6.1, you should not have that tag. This shows you are on master, so you’re going to have troubles if you don’t stick to matching versions.

1 Like