Error while compiling generate_trie.cpp

Nitesh_Tiwari · January 31, 2020, 8:24am

I was trying to generate trie for new binary file i have generated from added vocabulary.
When I am trying to compile its giving error.

/native_client# g++ generate_trie.cpp 
In file included from generate_trie.cpp:5:0:
ctcdecode/scorer.h:9:10: fatal error: lm/enumerate_vocab.hh: No such file or directory
 #include "lm/enumerate_vocab.hh"
          ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

Please help, thanks in advance!

lissyx · January 31, 2020, 10:27am

Please, read the documentation, you are not building it correctly.

lissyx · January 31, 2020, 10:27am

generate_trie is distributed as part of native_client.tar.xz, you should not need to rebuild it.

Nitesh_Tiwari · February 4, 2020, 8:19am

Hi @lissyx , I have generated my binaries with added vocabularies. and also generated the trie. but during training with the new trie and new binaries its showing segmentation fault (core dump) ?

lissyx · February 4, 2020, 8:24am

Please share more logs on the crash, ensure you are running ds_ctcdecoder version matching deepspeech training code and python version, and cross-check with released lm and trie file to verify it is not yours.

Nitesh_Tiwari · February 4, 2020, 11:24am

Use standard file APIs to delete files with this prefix.
Epoch 0 |   Training | Elapsed Time: 3:06:01 | Steps: 569 | Loss: 27.668679                                                                                                        
Epoch 0 | Validation | Elapsed Time: 0:01:12 | Steps: 32 | Loss: 57.548914 | Dataset: /home/rbeigcn1134841d/Downloads/indian_dev.csv                                               
I Saved new best validating model with loss 57.548914 to: /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060
I FINISHED optimization in 3:07:15.253444
INFO:tensorflow:Restoring parameters from /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060
I0204 16:53:11.407811 140016252487488 saver.py:1284] Restoring parameters from /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060
I Restored variables from best validation checkpoint at /home/rbeigcn1134841d/Desktop/mark1/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-236060, step 236060
Testing model on /home/rbeigcn1134841d/Downloads/indian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00                                                                                                                                      Fatal Python error: Fatal Python error: Segmentation faultSegmentation fault

Thread 0x

00007f5735fff700 (most recent call first):
  File Segmentation fault (core dumped)

lissyx · February 4, 2020, 11:25am

Right, so it’s test step, it’s likely to be mismatched ds_ctcdecoder. Please reinstall / upgrade it: pip install --upgrade $(python util/taskcluster.py --decoder).

Nitesh_Tiwari · February 4, 2020, 11:26am

everything is working fine with given default configs.(i.e.training, prediction)… It only breaks when i update vocabulary.txt

lissyx · February 4, 2020, 11:28am

So verify / share how you build it. Please look at data/lm, it should be self-contained and drive you to a working LM.

Nitesh_Tiwari · February 4, 2020, 11:42am

steps i did to generate LM:=

get alphabet.txt and add custom words.

../kenlm/build/bin/lmplz --discount_fallback -o 3 <mirrorfit.txt> mirrorfit.arpa
=== 1/5 Counting and sorting n-grams ===
Reading /home/rbeigcn1134841d/Desktop/mark1/mfit-models/mirrorfit.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 200003 types 200006
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:2400072 2:9327010816 3:17488144384
Substituting fallback discounts for order 0: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 1: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 2: D1=0.5 D2=1 D3+=1.5
Statistics:
1 200006 D1=0.5 D2=1 D3+=1.5
2 400006 D1=0.5 D2=1 D3+=1.5
3 200003 D1=0.5 D2=1 D3+=1.5
Memory estimate for binary LM:
type       kB
probing 17969 assuming -p 1.5
probing 21094 assuming -r models -p 1.5
trie    10718 without quantization
trie     7864 assuming -q 8 -b 8 quantization 
trie    10132 assuming -a 22 array pointer compression
trie     7278 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz	VmPeak:26364176 kB	VmRSS:22528 kB	RSSMax:6075308 kB	user:0.858606	sys:1.26977	CPU:2.12839	real:2.06948

../kenlm/build/bin/build_binary -T -s mirrorfit.arpa mirrorfit.binary
Reading mirrorfit.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS

…/DeepSpeech/generate_trie alphabet.txt mirrorfit.binary trie
Is the flow and outputs correct ??

Nitesh_Tiwari · February 4, 2020, 11:54am

@lissyx please help, really struggling with this error for last few days. Thanks in advance

lissyx · February 4, 2020, 1:04pm

It does not looks like you are passing the correct arguments to build_binary.

Please reply and provide the feedback on the other items I asked you to check.

Nitesh_Tiwari · February 5, 2020, 3:49am

I was following this tutorial " How I trained a specific french model to control my robot

Deep Speech"

creating binary file :

/bin/bin/./build_binary -T -s words.arpa  lm.binary

He was building it in the same way, please tell me otherwise what params to pass.

Nitesh_Tiwari · February 5, 2020, 4:04am

Also ,
If i use lm.binary in default package and my trie its giving core-dump. But if I use my binaries and trie given in the default package its working. Not sure why?

Nitesh_Tiwari · February 5, 2020, 7:39am

@lissyx plz give advice

lissyx · February 5, 2020, 8:04am

Because you keep insisting on :

not listening to what I am telling you
refuse to give us feedback on updating the ds_ctcdecoder package
don’t use proper documentation that I already linked you to.

I will stop helping you until you actually read and act on what I aksed earlier.

Nitesh_Tiwari · February 5, 2020, 8:18am

I am really sorry, forgot to inform i did ran
pip install --upgrade $(python util/taskcluster.py --decoder)
but issue still persists.

I am continuously referring data/lm as well the

TUTORIAL : How I trained a specific french model to control my robot

to generate the language model. May be i am missing some small things, just not able to get that.

lissyx · February 5, 2020, 8:41am

Wait, can we avoid confusion and get the whole picture ? It’s completely unclear what you are doing now.

Can you cross-check and share pip list | grep ds_ctcdecoder as well as git describe --tags ?

Do you have the crash with the default language model / trie ? Since you failed to share proper status at first, I assumed you had a mismatch …

Please, read doc and script. Don’t refer to anything else.

Nitesh_Tiwari · February 5, 2020, 8:53am

pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1-35-g94882fb

Yes default lm.binary and trie are working perfectly fine

Ok will check the generate_lm script and see the docs.

lissyx · February 5, 2020, 8:57am

Weird. If you are on v0.6.1, you should not have that tag. This shows you are on master, so you’re going to have troubles if you don’t stick to matching versions.

Topic		Replies	Views
Problems creating Trie file DeepSpeech	9	982	March 27, 2020
I am not able to find generate_trie.cpp in native client in DeepSpeech 0.9.3 DeepSpeech	1	242	February 22, 2021
Error: Trie file version mismatch (4 instead of expected 5). Update your trie file for deepspeech 0.6.0a15 DeepSpeech	4	591	December 5, 2019
Python error: Segmentation fault when training DeepSpeech	30	5118	January 6, 2020
Trie file creation DeepSpeech	11	747	August 6, 2020

Error while compiling generate_trie.cpp

TUTORIAL : How I trained a specific french model to control my robot

Related topics