Error while compiling generate_trie.cpp

Nitesh_Tiwari · February 5, 2020, 8:53am

pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1-35-g94882fb

Yes default lm.binary and trie are working perfectly fine

Ok will check the generate_lm script and see the docs.

lissyx · February 5, 2020, 8:57am

Weird. If you are on v0.6.1, you should not have that tag. This shows you are on master, so you’re going to have troubles if you don’t stick to matching versions.

Nitesh_Tiwari · February 5, 2020, 9:04am

I will checkout that tag and will let you know

Nitesh_Tiwari · February 6, 2020, 8:58am

Now i have matched versions
pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1

git describe --tags
v0.6.1

Now also after generating the LM, I am getting this error while training on checkpoint with my added vocabulary and my LM binary and trie

cmd=>

python3 DeepSpeech.py \

  --train_files /home/Downloads/indian_train.csv \
  --dev_files /home/Downloads/indian_dev.csv \
  --test_files /home/Downloads/indian_test.csv \
  --n_hidden 2048 \
  --train_batch_size 20 \
  --dev_batch_size 10 \
  --test_batch_size 10 \
  --epochs 1 \
  --learning_rate 0.0001 \
  --export_dir /home/Desktop/mark3/trieModel/ \
  --checkpoint_dir /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
  --cudnn_checkpoint /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
  --alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt \
  --lm_binary_path /home/Desktop/mark3/mfit-models/lm.binary \
  --lm_trie_path /home/Desktop/mark3/mfit-models/trie \

Error during training after it ends doing dev, while checking the test.csv error is coming

I Restored variables from best validation checkpoint at /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-234353, step 234353
Testing model on /home/Downloads/indian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00                                                                                                 Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultSegmentation faultSegmentation fault

Segmentation faultThread 0x

Segmentation fault (core dumped)

lissyx · February 6, 2020, 9:40am

There’s something wrong in your ctc decoder setup / trie production that is broken…

lissyx · February 6, 2020, 9:53am

Can you please tell us exactly how you proceed ? I’m really starting to loose patience here.

lissyx · February 6, 2020, 10:02am

What are the sizes of:

your vocabulary.txt file
your lm.binary file
your trie file

Can you ensure you used exactly the same alphabet file ?

Maybe there’s something bogus in your dataset.

Nitesh_Tiwari · February 6, 2020, 10:16am

vocabulary.txt = 1.7MB
lm.binary = 20.1MB
trie = 80 Bytes

lissyx · February 6, 2020, 10:21am

So you failed at generating the trie file. Since you have not yet shared how you do that, we can’t help you …

Nitesh_Tiwari · February 6, 2020, 10:26am

I have taken the generate_trie from native_client.amd64.cpu.linux.tar.xz and did ./generate_trie ../data/alphabet.txt lm.binary trie to generate the trie

lissyx · February 6, 2020, 10:28am

So that’s not the alphabet you are using for the training ?!

--alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt that does not looks like the same path as ../data/alphabet.txt …

Nitesh_Tiwari · February 6, 2020, 10:30am

i am sorry i used the same i.e. /home/Desktop/mark3/mfit-models/alphabet.txt
did paste other path here sorry

Nitesh_Tiwari · February 6, 2020, 10:32am

Also i checked the size of default lm.binary in ./data/lm its 945MB. Is there some issue in mine with 20MB size ?

lissyx · February 6, 2020, 10:39am

Lm size depends on your vocabulary file size so it might be normal of yours is small

lissyx · February 6, 2020, 10:41am

Please can we avoid constant round trips and get a clear view at once?

Share exact and accurate command line as well as ls for each of the involved files…

Nitesh_Tiwari · February 6, 2020, 11:03am

(mark3) root@computer:/home/computer/Desktop/mark3# ls
customModels  DeepSpeech  indianModel  kenlm  mfit-models  namesModel  tensorflow  trieModel

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/lmplz --discount_fallback --text mirrorfit.txt --arpa words.arpa --o 3
=== 1/5 Counting and sorting n-grams ===
Reading /home/computer/Desktop/mark3/mfit-models/mirrorfit.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 200003 types 200006
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:2400072 2:9327010816 3:17488144384
Substituting fallback discounts for order 0: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 1: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 2: D1=0.5 D2=1 D3+=1.5
Statistics:
1 200006 D1=0.5 D2=1 D3+=1.5
2 400006 D1=0.5 D2=1 D3+=1.5
3 200003 D1=0.5 D2=1 D3+=1.5
Memory estimate for binary LM:
type       kB
probing 17969 assuming -p 1.5
probing 21094 assuming -r models -p 1.5
trie    10718 without quantization
trie     7864 assuming -q 8 -b 8 quantization 
trie    10132 assuming -a 22 array pointer compression
trie     7278 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz	VmPeak:26372372 kB	VmRSS:22700 kB	RSSMax:6075336 kB	user:0.876577	sys:1.34088	CPU:2.21748	real:2.16143

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/build_binary -T -s words.arpa  lm.binary
Reading words.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS

Below one doesn’t give any output just create trie file

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../DeepSpeech/generate_trie alphabet.txt lm.binary trie

Deep-speech directory:-

(mark3) root@computer:/home/computer/Desktop/mark3/DeepSpeech# ls
bazel.patch                                                     DeepSpeech.py       libdeepspeech.so                      requirements.txt
bin                                                             doc                 LICENSE                               runNameTrieModel.sh
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR  Dockerfile          myDataset                             stats.py
CODE_OF_CONDUCT.md                                              evaluate.py         native_client                         SUPPORT.rst
CONTRIBUTING.rst                                                evaluate_tflite.py  native_client.amd64.cpu.linux.tar.xz  taskcluster
data                                                            examples            __pycache__                           transcribe.py
deepspeech                                                      generate_trie       README.mozilla                        util
deepspeech-0.6.1-checkpoint                                     GRAPH_VERSION       README.rst                            VERSION
deepspeech-0.6.1-checkpoint.tar.gz                              images              RELEASE.rst
deepspeech.h                                                    ISSUE_TEMPLATE.md   requirements_eval_tflite.txt

Please let me know if I forgot to mention anything.

lissyx · February 6, 2020, 11:10am

ls -hal otherwise it’s useless.

lissyx · February 6, 2020, 11:12am

I still don’t know your alphabet, vocabulary and new trie file sized…

lissyx · February 6, 2020, 11:16am

I’m pretty sure you lack a trie command line parameter here.

Nitesh_Tiwari · February 6, 2020, 11:17am

(mark3) root@computer:/home/computer/Desktop/mark3/DeepSpeech# ls -hal
total 657M
drwxr-xr-x 15 root            root               4.0K Feb  6 11:10 .
drwxrwxr-x 10 computer      computer    4.0K Feb  6 10:36 ..
-rw-r--r--  1 root            root                11K Feb  5 15:51 bazel.patch
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 bin
-rw-r--r--  1 root            root                173 Feb  5 15:51 build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR
-rw-r--r--  1 root            root                 60 Feb  5 15:51 .cardboardlint.yml
-rw-r--r--  1 root            root                691 Feb  5 15:51 CODE_OF_CONDUCT.md
-rwxr-xr-x  1 root            root                933 Feb  5 15:51 .compute
-rw-r--r--  1 root            root               2.1K Feb  5 15:51 CONTRIBUTING.rst
drwxr-xr-x  5 root            root               4.0K Feb  5 15:51 data
-rwxr-xr-x  1 syslog          Unix Group\nogroup 892K Jan 10 22:47 deepspeech
drwxr-xr-x  2             501 staff              4.0K Feb  6 14:17 deepspeech-0.6.1-checkpoint
-rw-rw-r--  1 computer       computer    613M Jan 23 17:35 deepspeech-0.6.1-checkpoint.tar.gz
-rw-r--r--  1 syslog          Unix Group\nogroup 8.4K Jan 10 22:45 deepspeech.h
-rwxr-xr-x  1 root            root                42K Feb  5 15:51 DeepSpeech.py
drwxr-xr-x  3 root            root               4.0K Feb  5 15:51 doc
-rw-r--r--  1 root            root               6.5K Feb  5 15:51 Dockerfile
-rwxr-xr-x  1 root            root               6.8K Feb  5 15:51 evaluate.py
-rw-r--r--  1 root            root               4.6K Feb  5 15:51 evaluate_tflite.py
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 examples
-r-xr-xr-x  1 syslog          Unix Group\nogroup 2.0M Jan 10 22:47 generate_trie
drwxr-xr-x  9 root            root               4.0K Feb  5 16:47 .git
-rw-r--r--  1 root            root                148 Feb  5 15:51 .gitattributes
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 .github
-rw-r--r--  1 root            root                474 Feb  5 15:51 .gitignore
-rw-r--r--  1 root            root                123 Feb  5 15:51 .gitmodules
-rw-r--r--  1 root            root                  2 Feb  5 15:51 GRAPH_VERSION
drwxr-xr-x  2 root            root               4.0K Feb  5 15:51 images
-rw-r--r--  1 root            root               1.2K Feb  5 15:51 ISSUE_TEMPLATE.md
-r-xr-xr-x  1 syslog          Unix Group\nogroup  34M Jan 10 22:47 libdeepspeech.so
-rw-r--r--  1 syslog          Unix Group\nogroup  17K Jan 10 22:45 LICENSE
drwxr-xr-x  3 computer computer    4.0K Jan 29 11:11 myDataset
drwxr-xr-x  9 root            root               4.0K Feb  5 15:51 native_client
-rw-rw-r--  1 computer computer    6.5M Feb  6 10:21 native_client.amd64.cpu.linux.tar.xz
drwxr-xr-x  2 root            root               4.0K Feb  6 10:36 __pycache__
-rw-r--r--  1 root            root                18K Feb  5 15:51 .pylintrc
-rw-r--r--  1 syslog          Unix Group\nogroup 1.2K Jan 10 22:45 README.mozilla
-rw-r--r--  1 root            root               5.0K Feb  5 15:51 README.rst
-rw-r--r--  1 root            root                437 Feb  5 15:51 .readthedocs.yml
-rw-r--r--  1 root            root                438 Feb  5 15:51 RELEASE.rst
-rw-r--r--  1 root            root                115 Feb  5 15:51 requirements_eval_tflite.txt
-rw-r--r--  1 root            root                340 Feb  5 15:51 requirements.txt
-rwxr-xr-x  1 computer     computer     869 Feb  6 11:08 runNameTrieModel.sh
-rw-r--r--  1 root            root               1.2K Feb  5 15:51 stats.py
-rw-r--r--  1 root            root               1.6K Feb  5 15:51 SUPPORT.rst
drwxr-xr-x  2 root            root                20K Feb  5 15:51 taskcluster
-rw-r--r--  1 root            root               2.5K Feb  5 15:51 .taskcluster.yml
-rwxr-xr-x  1 root            root               7.6K Feb  5 15:51 transcribe.py
-rw-r--r--  1 root            root                326 Feb  5 15:51 .travis.yml
drwxr-xr-x  3 root            root               4.0K Feb  6 10:36 util
-rw-r--r--  1 root            root                  6 Feb  5 15:51 VERSION

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ls -hal
total 44M
drwxr-xr-x  2 root            root            4.0K Feb  6 16:22 .
drwxrwxr-x 10 computer      computer 4.0K Feb  6 10:36 ..
-rw-r--r--  1 root            root             329 Jan 30 10:30 alphabet.txt
-rw-r--r--  1 root            root             20M Feb  6 16:21 lm.binary
-rw-r--r--  1 root            root            1.7M Jan 31 09:24 mirrorfit.txt
-rw-r--r--  1 root            root              80 Feb  6 16:22 trie
-rw-r--r--  1 root            root             23M Feb  6 16:18 words.arpa