Error while compiling generate_trie.cpp

It does not looks like you are passing the correct arguments to build_binary.

Please reply and provide the feedback on the other items I asked you to check.

I was following this tutorial " How I trained a specific french model to control my robot

Deep Speech"

creating binary file :

/bin/bin/./build_binary -T -s words.arpa  lm.binary

He was building it in the same way, please tell me otherwise what params to pass.

Also ,
If i use lm.binary in default package and my trie its giving core-dump. But if I use my binaries and trie given in the default package its working. Not sure why?

@lissyx plz give advice

Because you keep insisting on :

  • not listening to what I am telling you
  • refuse to give us feedback on updating the ds_ctcdecoder package
  • don’t use proper documentation that I already linked you to.

I will stop helping you until you actually read and act on what I aksed earlier.

I am really sorry, forgot to inform i did ran
pip install --upgrade $(python util/taskcluster.py --decoder)
but issue still persists.

I am continuously referring data/lm as well the

TUTORIAL : How I trained a specific french model to control my robot

to generate the language model. May be i am missing some small things, just not able to get that.

Wait, can we avoid confusion and get the whole picture ? It’s completely unclear what you are doing now.

Can you cross-check and share pip list | grep ds_ctcdecoder as well as git describe --tags ?

Do you have the crash with the default language model / trie ? Since you failed to share proper status at first, I assumed you had a mismatch …

Please, read doc and script. Don’t refer to anything else.

pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1-35-g94882fb

Yes default lm.binary and trie are working perfectly fine

Ok will check the generate_lm script and see the docs.

Weird. If you are on v0.6.1, you should not have that tag. This shows you are on master, so you’re going to have troubles if you don’t stick to matching versions.

1 Like

I will checkout that tag and will let you know

Now i have matched versions
pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1

Now also after generating the LM, I am getting this error while training on checkpoint with my added vocabulary and my LM binary and trie

cmd=>

python3 DeepSpeech.py \

  --train_files /home/Downloads/indian_train.csv \
  --dev_files /home/Downloads/indian_dev.csv \
  --test_files /home/Downloads/indian_test.csv \
  --n_hidden 2048 \
  --train_batch_size 20 \
  --dev_batch_size 10 \
  --test_batch_size 10 \
  --epochs 1 \
  --learning_rate 0.0001 \
  --export_dir /home/Desktop/mark3/trieModel/ \
  --checkpoint_dir /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
  --cudnn_checkpoint /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
  --alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt \
  --lm_binary_path /home/Desktop/mark3/mfit-models/lm.binary \
  --lm_trie_path /home/Desktop/mark3/mfit-models/trie \

Error during training after it ends doing dev, while checking the test.csv error is coming

I Restored variables from best validation checkpoint at /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-234353, step 234353
Testing model on /home/Downloads/indian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00                                                                                                 Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultSegmentation faultSegmentation fault

Segmentation faultThread 0x

Segmentation fault (core dumped)

There’s something wrong in your ctc decoder setup / trie production that is broken…

Can you please tell us exactly how you proceed ? I’m really starting to loose patience here.

1 Like

What are the sizes of:

  • your vocabulary.txt file
  • your lm.binary file
  • your trie file

Can you ensure you used exactly the same alphabet file ?

Maybe there’s something bogus in your dataset.

vocabulary.txt = 1.7MB
lm.binary = 20.1MB
trie = 80 Bytes

So you failed at generating the trie file. Since you have not yet shared how you do that, we can’t help you …

I have taken the generate_trie from native_client.amd64.cpu.linux.tar.xz and did ./generate_trie ../data/alphabet.txt lm.binary trie to generate the trie

So that’s not the alphabet you are using for the training ?!

--alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt that does not looks like the same path as ../data/alphabet.txt

i am sorry i used the same i.e. /home/Desktop/mark3/mfit-models/alphabet.txt
did paste other path here sorry

Also i checked the size of default lm.binary in ./data/lm its 945MB. Is there some issue in mine with 20MB size ?