I will checkout that tag and will let you know
Now i have matched versions
pip list | grep ds_ctcdecoder
ds-ctcdecoder 0.6.1
git describe --tags
v0.6.1
Now also after generating the LM, I am getting this error while training on checkpoint with my added vocabulary and my LM binary and trie
cmd=>
python3 DeepSpeech.py \
--train_files /home/Downloads/indian_train.csv \
--dev_files /home/Downloads/indian_dev.csv \
--test_files /home/Downloads/indian_test.csv \
--n_hidden 2048 \
--train_batch_size 20 \
--dev_batch_size 10 \
--test_batch_size 10 \
--epochs 1 \
--learning_rate 0.0001 \
--export_dir /home/Desktop/mark3/trieModel/ \
--checkpoint_dir /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
--cudnn_checkpoint /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/ \
--alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt \
--lm_binary_path /home/Desktop/mark3/mfit-models/lm.binary \
--lm_trie_path /home/Desktop/mark3/mfit-models/trie \
Error during training after it ends doing dev, while checking the test.csv error is coming
I Restored variables from best validation checkpoint at /home/Desktop/mark3/DeepSpeech/deepspeech-0.6.1-checkpoint/best_dev-234353, step 234353
Testing model on /home/Downloads/indian_test.csv
Test epoch | Steps: 0 | Elapsed Time: 0:00:00 Fatal Python error: Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultSegmentation faultSegmentation fault
Segmentation faultThread 0x
Segmentation fault (core dumped)
Thereās something wrong in your ctc decoder setup / trie production that is brokenā¦
Can you please tell us exactly how you proceed ? Iām really starting to loose patience here.
What are the sizes of:
- your
vocabulary.txt
file - your
lm.binary
file - your
trie
file
Can you ensure you used exactly the same alphabet file ?
Maybe thereās something bogus in your dataset.
vocabulary.txt = 1.7MB
lm.binary = 20.1MB
trie = 80 Bytes
So you failed at generating the trie file. Since you have not yet shared how you do that, we canāt help you ā¦
I have taken the generate_trie from native_client.amd64.cpu.linux.tar.xz and did ./generate_trie ../data/alphabet.txt lm.binary trie
to generate the trie
So thatās not the alphabet you are using for the training ?!
--alphabet_config_path /home/Desktop/mark3/mfit-models/alphabet.txt
that does not looks like the same path as ../data/alphabet.txt
ā¦
i am sorry i used the same i.e. /home/Desktop/mark3/mfit-models/alphabet.txt
did paste other path here sorry
Also i checked the size of default lm.binary in ./data/lm its 945MB. Is there some issue in mine with 20MB size ?
Lm size depends on your vocabulary file size so it might be normal of yours is small
Please can we avoid constant round trips and get a clear view at once?
Share exact and accurate command line as well as ls
for each of the involved filesā¦
(mark3) root@computer:/home/computer/Desktop/mark3# ls
customModels DeepSpeech indianModel kenlm mfit-models namesModel tensorflow trieModel
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/lmplz --discount_fallback --text mirrorfit.txt --arpa words.arpa --o 3
=== 1/5 Counting and sorting n-grams ===
Reading /home/computer/Desktop/mark3/mfit-models/mirrorfit.txt
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 200003 types 200006
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:2400072 2:9327010816 3:17488144384
Substituting fallback discounts for order 0: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 1: D1=0.5 D2=1 D3+=1.5
Substituting fallback discounts for order 2: D1=0.5 D2=1 D3+=1.5
Statistics:
1 200006 D1=0.5 D2=1 D3+=1.5
2 400006 D1=0.5 D2=1 D3+=1.5
3 200003 D1=0.5 D2=1 D3+=1.5
Memory estimate for binary LM:
type kB
probing 17969 assuming -p 1.5
probing 21094 assuming -r models -p 1.5
trie 10718 without quantization
trie 7864 assuming -q 8 -b 8 quantization
trie 10132 assuming -a 22 array pointer compression
trie 7278 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:2400072 2:6400096 3:4000060
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz VmPeak:26372372 kB VmRSS:22700 kB RSSMax:6075336 kB user:0.876577 sys:1.34088 CPU:2.21748 real:2.16143
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/build_binary -T -s words.arpa lm.binary
Reading words.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS
Below one doesnāt give any output just create trie file
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../DeepSpeech/generate_trie alphabet.txt lm.binary trie
Deep-speech directory:-
(mark3) root@computer:/home/computer/Desktop/mark3/DeepSpeech# ls
bazel.patch DeepSpeech.py libdeepspeech.so requirements.txt
bin doc LICENSE runNameTrieModel.sh
build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR Dockerfile myDataset stats.py
CODE_OF_CONDUCT.md evaluate.py native_client SUPPORT.rst
CONTRIBUTING.rst evaluate_tflite.py native_client.amd64.cpu.linux.tar.xz taskcluster
data examples __pycache__ transcribe.py
deepspeech generate_trie README.mozilla util
deepspeech-0.6.1-checkpoint GRAPH_VERSION README.rst VERSION
deepspeech-0.6.1-checkpoint.tar.gz images RELEASE.rst
deepspeech.h ISSUE_TEMPLATE.md requirements_eval_tflite.txt
Please let me know if I forgot to mention anything.
ls -hal
otherwise itās useless.
I still donāt know your alphabet, vocabulary and new trie file sizedā¦
Iām pretty sure you lack a trie
command line parameter here.
(mark3) root@computer:/home/computer/Desktop/mark3/DeepSpeech# ls -hal
total 657M
drwxr-xr-x 15 root root 4.0K Feb 6 11:10 .
drwxrwxr-x 10 computer computer 4.0K Feb 6 10:36 ..
-rw-r--r-- 1 root root 11K Feb 5 15:51 bazel.patch
drwxr-xr-x 2 root root 4.0K Feb 5 15:51 bin
-rw-r--r-- 1 root root 173 Feb 5 15:51 build-python-wheel.yml-DISABLED_ENABLE_ME_TO_REBUILD_DURING_PR
-rw-r--r-- 1 root root 60 Feb 5 15:51 .cardboardlint.yml
-rw-r--r-- 1 root root 691 Feb 5 15:51 CODE_OF_CONDUCT.md
-rwxr-xr-x 1 root root 933 Feb 5 15:51 .compute
-rw-r--r-- 1 root root 2.1K Feb 5 15:51 CONTRIBUTING.rst
drwxr-xr-x 5 root root 4.0K Feb 5 15:51 data
-rwxr-xr-x 1 syslog Unix Group\nogroup 892K Jan 10 22:47 deepspeech
drwxr-xr-x 2 501 staff 4.0K Feb 6 14:17 deepspeech-0.6.1-checkpoint
-rw-rw-r-- 1 computer computer 613M Jan 23 17:35 deepspeech-0.6.1-checkpoint.tar.gz
-rw-r--r-- 1 syslog Unix Group\nogroup 8.4K Jan 10 22:45 deepspeech.h
-rwxr-xr-x 1 root root 42K Feb 5 15:51 DeepSpeech.py
drwxr-xr-x 3 root root 4.0K Feb 5 15:51 doc
-rw-r--r-- 1 root root 6.5K Feb 5 15:51 Dockerfile
-rwxr-xr-x 1 root root 6.8K Feb 5 15:51 evaluate.py
-rw-r--r-- 1 root root 4.6K Feb 5 15:51 evaluate_tflite.py
drwxr-xr-x 2 root root 4.0K Feb 5 15:51 examples
-r-xr-xr-x 1 syslog Unix Group\nogroup 2.0M Jan 10 22:47 generate_trie
drwxr-xr-x 9 root root 4.0K Feb 5 16:47 .git
-rw-r--r-- 1 root root 148 Feb 5 15:51 .gitattributes
drwxr-xr-x 2 root root 4.0K Feb 5 15:51 .github
-rw-r--r-- 1 root root 474 Feb 5 15:51 .gitignore
-rw-r--r-- 1 root root 123 Feb 5 15:51 .gitmodules
-rw-r--r-- 1 root root 2 Feb 5 15:51 GRAPH_VERSION
drwxr-xr-x 2 root root 4.0K Feb 5 15:51 images
-rw-r--r-- 1 root root 1.2K Feb 5 15:51 ISSUE_TEMPLATE.md
-r-xr-xr-x 1 syslog Unix Group\nogroup 34M Jan 10 22:47 libdeepspeech.so
-rw-r--r-- 1 syslog Unix Group\nogroup 17K Jan 10 22:45 LICENSE
drwxr-xr-x 3 computer computer 4.0K Jan 29 11:11 myDataset
drwxr-xr-x 9 root root 4.0K Feb 5 15:51 native_client
-rw-rw-r-- 1 computer computer 6.5M Feb 6 10:21 native_client.amd64.cpu.linux.tar.xz
drwxr-xr-x 2 root root 4.0K Feb 6 10:36 __pycache__
-rw-r--r-- 1 root root 18K Feb 5 15:51 .pylintrc
-rw-r--r-- 1 syslog Unix Group\nogroup 1.2K Jan 10 22:45 README.mozilla
-rw-r--r-- 1 root root 5.0K Feb 5 15:51 README.rst
-rw-r--r-- 1 root root 437 Feb 5 15:51 .readthedocs.yml
-rw-r--r-- 1 root root 438 Feb 5 15:51 RELEASE.rst
-rw-r--r-- 1 root root 115 Feb 5 15:51 requirements_eval_tflite.txt
-rw-r--r-- 1 root root 340 Feb 5 15:51 requirements.txt
-rwxr-xr-x 1 computer computer 869 Feb 6 11:08 runNameTrieModel.sh
-rw-r--r-- 1 root root 1.2K Feb 5 15:51 stats.py
-rw-r--r-- 1 root root 1.6K Feb 5 15:51 SUPPORT.rst
drwxr-xr-x 2 root root 20K Feb 5 15:51 taskcluster
-rw-r--r-- 1 root root 2.5K Feb 5 15:51 .taskcluster.yml
-rwxr-xr-x 1 root root 7.6K Feb 5 15:51 transcribe.py
-rw-r--r-- 1 root root 326 Feb 5 15:51 .travis.yml
drwxr-xr-x 3 root root 4.0K Feb 6 10:36 util
-rw-r--r-- 1 root root 6 Feb 5 15:51 VERSION
(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ls -hal
total 44M
drwxr-xr-x 2 root root 4.0K Feb 6 16:22 .
drwxrwxr-x 10 computer computer 4.0K Feb 6 10:36 ..
-rw-r--r-- 1 root root 329 Jan 30 10:30 alphabet.txt
-rw-r--r-- 1 root root 20M Feb 6 16:21 lm.binary
-rw-r--r-- 1 root root 1.7M Jan 31 09:24 mirrorfit.txt
-rw-r--r-- 1 root root 80 Feb 6 16:22 trie
-rw-r--r-- 1 root root 23M Feb 6 16:18 words.arpa
Itās not like I told you to look at that scriptā¦