Segmentation fault ctc_beam_search_decoder_batch on Mac

Jendker · March 17, 2020, 8:08pm

On Mac I am getting the following error when running inference from checkpoint:

Fatal Python error: Segmentation fault
Thread 0x000000010e058dc0 (most recent call first):
  File "/Users/Jedrzej/DeepSpeech/venv/lib/python3.7/site-packages/ds_ctcdecoder/swigwrapper.py", line 364 in ctc_beam_search_decoder_batch
  File "/Users/Jedrzej/DeepSpeech/venv/lib/python3.7/site-packages/ds_ctcdecoder/__init__.py", line 128 in ctc_beam_search_decoder_batch
  File "/Users/Jedrzej/DeepSpeech/gpu_worker.py", line 199 in run_transcribe
  File "/Users/Jedrzej/DeepSpeech/gpu_worker.py", line 214 in evaluate
  File "/Users/Jedrzej/DeepSpeech/gpu_worker.py", line 240 in main
  File "/Users/Jedrzej/DeepSpeech/venv/lib/python3.7/site-packages/absl/app.py", line 250 in _run_main
  File "/Users/Jedrzej/DeepSpeech/venv/lib/python3.7/site-packages/absl/app.py", line 299 in run
  File "/Users/Jedrzej/DeepSpeech/gpu_worker.py", line 247 in <module>

I am using the master branch with the newest ds-ctcdecoder package:

ds-ctcdecoder        0.7.0a2

I have tested both my scorer and the one from data/lm.
There are no errors under Ubuntu 18.04 though.

lissyx · March 17, 2020, 8:13pm

We have not made a new alpha yet and we merged a few changes, I’m unsure if they are still compatible. Can you reproduce with 0.7.0 alpha 2 tag ?

Jendker · March 19, 2020, 2:32pm

No, I cannot reproduce with alpha 2 tag. I’ll then stay at alpha 2 tag and check later after new tag will be published.

GaGo · March 30, 2020, 12:55pm

I getting a Segmentation fault, too. I use CentOs

I used the script generate_lm.py and python3 generate_package.py --lm lm.binary --vocab librispeech-vocab-500k.txt --package test.scorer --default_alpha 0.8 --default_beta 1.85

I tried to reinstall ctc_decoder. I also tried Alpha 0.7.0a2 and 0.7.0a3.
Maybe it’s the kenlm package or the ctc_decoder, or some dependencies.

Does anybody have any ideas, how I can solve this problem?

Test epoch | Steps: 0 | Elapsed Time: 0:00:00 2020-03-30 14:30:36.584604: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2020-03-30 14:30:36.810027: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
Fatal Python error: Segmentation fault

Thread 0x00007f3c9e8cc200 (most recent call first):
File “~/.site-packages/lib64/3.7.4_intel_2019.6_mavx/site-packages/ds_ctcdecoder/swigwrapper.py”, line 364 in ctc_beam_search_decoder_batch
File “+/.site-packages/lib64/3.7.4_intel_2019.6_mavx/site-packages/ds_ctcdecoder/init.py”, line 128 in ctc_beam_search_decoder_batch
File “evaluate.py”, line 116 in run_test
File “evaluate.py”, line 134 in evaluate
File “evaluate.py”, line 147 in main
File “~/.site-packages/lib64/3.7.4_intel_2019.6_mavx/site-packages/absl/app.py”, line 250 in _run_main
File “~/.site-packages/lib64/3.7.4_intel_2019.6_mavx/site-packages/absl/app.py”, line 299 in run
File “evaluate.py”, line 156 in
./test.sh: line 14: 49508 Segmentation fault

Ps.
With the provided kenlm.scorer file the train phase is working.

lissyx · March 30, 2020, 2:02pm

Scorer is not used at training step.

Maybe, who knows if you don’t share your setup and what you did ?

GaGo · March 30, 2020, 2:55pm

Thanks for helping.
Here a little bit more information’s about the system.

Scorer is not used at training step.

Sorry, I meant test steps.

Steps:
clone DeepSpeech.git
clone kenlm/kpu

mkdir build
cmake …
make

Add it to PATH environment variable.

Install Dependencies.
python3 -m pip install -r requirements.txt
python3 -m pip install $(python3 util/taskcluster.py --decoder --target .) --upgrade

cd data/lm
python3 generate_lm.py

python3 generate_package.py --lm lm.binary --vocab librispeech-vocab-500k.txt --package test.scorer --default_alpha 0.8 --default_beta 1.85

python3 evaluate.py --alphabet_config_path data/alphabet_de.txt --test_files ~/clips/test.csv --test_batch_size 1 --log_level 0 --checkpoint_dir ~/checkpoint_training_tuda-1024-2020.03.13 --export_dir ~/export_training_tuda-1024-2020.03.13 --scorer_path ./data/lm/test.scorer

I already trained a little bit.
It already works with the provided file.
So it should also work with the trained one.
Or can I not exchange the scorer file after training?

I don’t know what is worth to share:
Operating System: CentOS Linux 7 (Core)
nvidia-gpu Driver Version: 440.64.00 CUDA Version: 10.2
Python 3.7.4
pip Packages:
Package Version

ds-ctcdecoder 0.7.0a3
tensorboard 1.15.0
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.0
tensorflow 1.15.0
sox 1.3.7
wheel 0.34.2
setuptools 46.1.3
numpy 1.18.2

lissyx · March 30, 2020, 2:58pm

What provided file ?

lissyx · March 30, 2020, 2:59pm

Are you sure those steps worked ?

GaGo · March 30, 2020, 3:09pm

What provided file ?
the kenlm.scorer file from the repo.

Are you sure those steps worked ?

This are the result

Downloading http://www.openslr.org/resources/11/librispeech-lm-norm.txt.gz into
tmp/upper.txt.gz...
Converting to lower case and counting word frequencies...
Creating ARPA file...
=== 1/5 Counting and sorting n-grams ===
Reading ~/tmp/lower.txt.gz
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 803288729 types 973676
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:11684112 2:15712154624 3:29460291584 4:47136464896  5:68740677632
Statistics:
1 973676 D1=0.647192 D2=1.04159 D3+=1.3919
2 41161096 D1=0.723617 D2=1.06317 D3+=1.36127
3 49484133/207278547 D1=0.804357 D2=1.09256 D3+=1.31993
4 60615302/438095063 D1=0.876863 D2=1.15052 D3+=1.32047
5 42225053/587120377 D1=0.914203 D2=1.27108 D3+=1.35262
Memory estimate for binary LM:
type      MB
probing 4211 assuming -p 1.5
probing 5080 assuming -r models -p 1.5
trie    2244 without quantization
trie    1281 assuming -q 8 -b 8 quantization 
trie    1899 assuming -a 22 array pointer compression
trie     936 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:11684112 2:658577536 3:989682660 4:1454767248 5:1182301484
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
  *******#############################################################################################
=== 4/5 Calculating and writing order-interpolated probabilities ===
Chain sizes: 1:11684112 2:658577536 3:989682660 4:1454767248 5:1182301484
    ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
 ####################################################################################################
=== 5/5 Writing ARPA model ===
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Name:lmplz      VmPeak:158273304 kB     VmRSS:32012 kB  RSSMax:44293008 kB      user:668.277    sys:127.414     CPU:795.691     real:548.043
Filtering ARPA file...
Reading tmp/lm.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
   ****************************************************************************************************
Building lm.binary...
Reading tmp/lm_filtered.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Identifying n-grams omitted by SRI
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Quantizing
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Writing trie
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
SUCCESS

generate_package

500000 unique words read from vocabulary file.
Doesn’t look like a character based model.
Package created in test.scorer

lissyx · March 30, 2020, 3:11pm

So it means it is your scorer creation that fails.

What’s the file’s size ?

GaGo · March 30, 2020, 3:15pm

I think so too.

Filesize
5.8G tmp/lm_filtered.arpa
898M lm.binary
898M test.scorer

lissyx · March 30, 2020, 3:19pm

Where is your alphabet here?

GaGo · March 30, 2020, 3:24pm

Yep, I forgot the alphabet.
I am sorry.
Thanks a lot!

lissyx · March 30, 2020, 5:08pm

https://github.com/mozilla/DeepSpeech/pull/2863 should avoid that mistake in the future

cekhwang · April 12, 2020, 9:26pm

Is the alphabet file solved this problem?

GaGo · April 13, 2020, 11:38am

Yes, its working. But checkout the master branch and you will get a error message if you do it wrong.

Topic		Replies	Views
Segmentation fault DeepSpeech	11	2538	April 27, 2020
Error when start test epoch DeepSpeech	8	817	June 20, 2020
Python error: Segmentation fault when training DeepSpeech	30	5131	January 6, 2020
Unable to start training process Segmentation fault (core dumped) D: DeepSpeech	19	3269	April 23, 2020
Segmentation fault when TESTING DeepSpeech	2	668	June 25, 2020

Segmentation fault ctc_beam_search_decoder_batch on Mac

Related topics