Error while compiling generate_trie.cpp

I will checkout that tag and will let you know

Now i have matched versions
pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1

Now also after generating the LM, I am getting this error while training on checkpoint with my added vocabulary and my LM binary and trie

cmd=>

python3 DeepSpeech.py \

  --train_files /home/Downloads/indian_train.csv \
  --dev_files /home/Downloads/indian_dev.csv \
  --test_files /home/Downloads/indian_test.csv \
  --n_hidden 2048 \
  --train_batch_size 20 \
  --dev_batch_size 10 \
  --test_batch_size 10 \
  --epochs 1 \
  --learning_rate 0.0001 \
  --export_dir /home/Desktop/mark3/trieModel/ \
  --checkpoint_dir /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
  --cudnn_checkpoint /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
  --alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt \
  --lm_binary_path /home/Desktop/mark3/mfit-models/lm.binary \
  --lm_trie_path /home/Desktop/mark3/mfit-models/trie \

Error during training after it ends doing dev, while checking the test.csv error is coming

I Restored variables from best validation checkpoint at /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-234353, step 234353
Testing model on /home/Downloads/indian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00                                                                                                 Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultSegmentation faultSegmentation fault

Segmentation faultThread 0x

Segmentation fault (core dumped)

Thereā€™s something wrong in your ctc decoder setup / trie production that is brokenā€¦

Can you please tell us exactly how you proceed ? Iā€™m really starting to loose patience here.

1 Like

What are the sizes of:

  • your vocabulary.txt file
  • your lm.binary file
  • your trie file

Can you ensure you used exactly the same alphabet file ?

Maybe thereā€™s something bogus in your dataset.

vocabulary.txt = 1.7MB
lm.binary = 20.1MB
trie = 80 Bytes

So you failed at generating the trie file. Since you have not yet shared how you do that, we canā€™t help you ā€¦

I have taken the generate_trie from native_client.amd64.cpu.linux.tar.xz and did ./generate_trie ../data/alphabet.txt lm.binary trie to generate the trie

So thatā€™s not the alphabet you are using for the training ?!

--alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt that does not looks like the same path as ../data/alphabet.txt ā€¦

i am sorry i used the same i.e. /home/Desktop/mark3/mfit-models/alphabet.txt
did paste other path here sorry

Also i checked the size of default lm.binary in ./data/lm its 945MB. Is there some issue in mine with 20MB size ?

Lm size depends on your vocabulary file size so it might be normal of yours is small

Please can we avoid constant round trips and get a clear view at once?

Share exact and accurate command line as well as ls for each of the involved filesā€¦

(mark3) root@computer:/home/computer/Desktop/mark3# ls
customModels  DeepSpeech  indianModel  kenlm  mfit-models  namesModel  tensorflow  trieModel
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/lmplz --discount_fallback --text mirrorfit.txt --arpa words.arpa --o 3
=== 1/5 Counting and sorting n-grams ===
Reading /home/computer/Desktop/mark3/mfit-models/mirrorfit.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 200003 types 200006
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:2400072 2:9327010816 3:17488144384
Substituting fallback discounts for order 0: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 1: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 2: D1=0.5 D2=1 D3+=1.5
Statistics:
1 200006 D1=0.5 D2=1 D3+=1.5
2 400006 D1=0.5 D2=1 D3+=1.5
3 200003 D1=0.5 D2=1 D3+=1.5
Memory estimate for binary LM:
type       kB
probing 17969 assuming -p 1.5
probing 21094 assuming -r models -p 1.5
trie    10718 without quantization
trie     7864 assuming -q 8 -b 8 quantization 
trie    10132 assuming -a 22 array pointer compression
trie     7278 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz	VmPeak:26372372 kB	VmRSS:22700 kB	RSSMax:6075336 kB	user:0.876577	sys:1.34088	CPU:2.21748	real:2.16143
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/build_binary -T -s words.arpa  lm.binary
Reading words.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS

Below one doesnā€™t give any output just create trie file

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../DeepSpeech/generate_trie alphabet.txt lm.binary trie

Deep-speech directory:-

(mark3) root@computer:/home/computer/Desktop/mark3/DeepSpeech# ls
bazel.patch                                                     DeepSpeech.py       libdeepspeech.so                      requirements.txt
bin                                                             doc                 LICENSE                               runNameTrieModel.sh
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR  Dockerfile          myDataset                             stats.py
CODE_OF_CONDUCT.md                                              evaluate.py         native_client                         SUPPORT.rst
CONTRIBUTING.rst                                                evaluate_tflite.py  native_client.amd64.cpu.linux.tar.xz  taskcluster
data                                                            examples            __pycache__                           transcribe.py
deepspeech                                                      generate_trie       README.mozilla                        util
deepspeech-0.6.1-checkpoint                                     GRAPH_VERSION       README.rst                            VERSION
deepspeech-0.6.1-checkpoint.tar.gz                              images              RELEASE.rst
deepspeech.h                                                    ISSUE_TEMPLATE.md   requirements_eval_tflite.txt

Please let me know if I forgot to mention anything.

ls -hal otherwise itā€™s useless.

I still donā€™t know your alphabet, vocabulary and new trie file sizedā€¦

Iā€™m pretty sure you lack a trie command line parameter here.

(mark3) root@computer:/home/computer/Desktop/mark3/DeepSpeech# ls -hal
total 657M
drwxr-xr-x 15 root            root               4.0K Feb  6 11:10 .
drwxrwxr-x 10 computer      computer    4.0K Feb  6 10:36 ..
-rw-r--r--  1 root            root                11K Feb  5 15:51 bazel.patch
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 bin
-rw-r--r--  1 root            root                173 Feb  5 15:51 build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR
-rw-r--r--  1 root            root                 60 Feb  5 15:51 .cardboardlint.yml
-rw-r--r--  1 root            root                691 Feb  5 15:51 CODE_OF_CONDUCT.md
-rwxr-xr-x  1 root            root                933 Feb  5 15:51 .compute
-rw-r--r--  1 root            root               2.1K Feb  5 15:51 CONTRIBUTING.rst
drwxr-xr-x  5 root            root               4.0K Feb  5 15:51 data
-rwxr-xr-x  1 syslog          Unix Group\nogroup 892K Jan 10 22:47 deepspeech
drwxr-xr-x  2             501 staff              4.0K Feb  6 14:17 deepspeech-0.6.1-checkpoint
-rw-rw-r--  1 computer       computer    613M Jan 23 17:35 deepspeech-0.6.1-checkpoint.tar.gz
-rw-r--r--  1 syslog          Unix Group\nogroup 8.4K Jan 10 22:45 deepspeech.h
-rwxr-xr-x  1 root            root                42K Feb  5 15:51 DeepSpeech.py
drwxr-xr-x  3 root            root               4.0K Feb  5 15:51 doc
-rw-r--r--  1 root            root               6.5K Feb  5 15:51 Dockerfile
-rwxr-xr-x  1 root            root               6.8K Feb  5 15:51 evaluate.py
-rw-r--r--  1 root            root               4.6K Feb  5 15:51 evaluate_tflite.py
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 examples
-r-xr-xr-x  1 syslog          Unix Group\nogroup 2.0M Jan 10 22:47 generate_trie
drwxr-xr-x  9 root            root               4.0K Feb  5 16:47 .git
-rw-r--r--  1 root            root                148 Feb  5 15:51 .gitattributes
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 .github
-rw-r--r--  1 root            root                474 Feb  5 15:51 .gitignore
-rw-r--r--  1 root            root                123 Feb  5 15:51 .gitmodules
-rw-r--r--  1 root            root                  2 Feb  5 15:51 GRAPH_VERSION
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 images
-rw-r--r--  1 root            root               1.2K Feb  5 15:51 ISSUE_TEMPLATE.md
-r-xr-xr-x  1 syslog          Unix Group\nogroup  34M Jan 10 22:47 libdeepspeech.so
-rw-r--r--  1 syslog          Unix Group\nogroup  17K Jan 10 22:45 LICENSE
drwxr-xr-x  3 computer computer    4.0K Jan 29 11:11 myDataset
drwxr-xr-x  9 root            root               4.0K Feb  5 15:51 native_client
-rw-rw-r--  1 computer computer    6.5M Feb  6 10:21 native_client.amd64.cpu.linux.tar.xz
drwxr-xr-x  2 root            root               4.0K Feb  6 10:36 __pycache__
-rw-r--r--  1 root            root                18K Feb  5 15:51 .pylintrc
-rw-r--r--  1 syslog          Unix Group\nogroup 1.2K Jan 10 22:45 README.mozilla
-rw-r--r--  1 root            root               5.0K Feb  5 15:51 README.rst
-rw-r--r--  1 root            root                437 Feb  5 15:51 .readthedocs.yml
-rw-r--r--  1 root            root                438 Feb  5 15:51 RELEASE.rst
-rw-r--r--  1 root            root                115 Feb  5 15:51 requirements_eval_tflite.txt
-rw-r--r--  1 root            root                340 Feb  5 15:51 requirements.txt
-rwxr-xr-x  1 computer     computer     869 Feb  6 11:08 runNameTrieModel.sh
-rw-r--r--  1 root            root               1.2K Feb  5 15:51 stats.py
-rw-r--r--  1 root            root               1.6K Feb  5 15:51 SUPPORT.rst
drwxr-xr-x  2 root            root                20K Feb  5 15:51 taskcluster
-rw-r--r--  1 root            root               2.5K Feb  5 15:51 .taskcluster.yml
-rwxr-xr-x  1 root            root               7.6K Feb  5 15:51 transcribe.py
-rw-r--r--  1 root            root                326 Feb  5 15:51 .travis.yml
drwxr-xr-x  3 root            root               4.0K Feb  6 10:36 util
-rw-r--r--  1 root            root                  6 Feb  5 15:51 VERSION
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ls -hal
total 44M
drwxr-xr-x  2 root            root            4.0K Feb  6 16:22 .
drwxrwxr-x 10 computer      computer 4.0K Feb  6 10:36 ..
-rw-r--r--  1 root            root             329 Jan 30 10:30 alphabet.txt
-rw-r--r--  1 root            root             20M Feb  6 16:21 lm.binary
-rw-r--r--  1 root            root            1.7M Jan 31 09:24 mirrorfit.txt
-rw-r--r--  1 root            root              80 Feb  6 16:22 trie
-rw-r--r--  1 root            root             23M Feb  6 16:18 words.arpa

1 Like

Itā€™s not like I told you to look at that scriptā€¦