[SOLVED] Unable to optimize language model (Segmentation fault, core dump )

I’ve created lm.binary vocab-500000.txt and kenlm.scorer, now when trying to optimize I face this error.

Deep Speech version: 0.9.1
OS: Linux ubuntu 18
GPU: Titan V
Ram: 16Gigabyte
Using conda environment

Steps :

So, I’ve cloned the kenLM repo. and passed the steps to create bin folder files.

then I used the next line command to create lm.binary and vocab-500000.txt ( I have 2.1 Gig of text ).

python3 data/lm/generate_lm.py --input_txt /home/robodoc/Masoud/Language_model/CLEANED_TEXT/TotalText.txt --output_dir data/lm --top_k 500000 --kenlm_bins /home/robodoc/Masoud/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie

Then I downloaded native_client.amd64.cuda.linux.tar.xz to have proper generate_scorer_package files. using next line command :

./generate_scorer_package --alphabet …/alphabet.txt --lm lm.binary --vocab vocab-500000.txt --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284

now when running the next command ( to optimize alpha and beta ) It will complete Trial 0 but in the middle of next round of optimization throw error ( I tested test_batch_size 2 and 4, both same result) :

python lm_optimizer.py --test_files /home/robodoc/Masoud/mozilla_data/fa/corpus/fa/clips/dev_mr.csv --scorer_path data/lm/kenlm.scorer --checkpoint_dir data/load_chkpnt --train_cudnn True --n_hidden 1024 --test_batch_size 4

Test epoch | Steps: 4622 | Elapsed Time: 0:19:11 Fatal Python error: Segmentation fault

Thread 0x00007f226affd700 (most recent call first):
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/connection.py”, line 379 in _recv
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/connection.py”, line 407 in _recv_bytes
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/connection.py”, line 250 in recv
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/pool.py”, line 463 in _handle_results
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 864 in run
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 916 in _bootstrap_inner
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 884 in _bootstrap

Thread 0x00007f226b7fe700 (most recent call first):
File “/home/robodoc/Masoud/conda_deepspeech0.9.1/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 97 in _limit
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/pool.py”, line 290 in _guarded_task_generation
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/pool.py”, line 419 in _handle_tasks
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 864 in run
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 916 in _bootstrap_inner
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 884 in _bootstrap

Thread 0x00007f226bfff700 (most recent call first):
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/multiprocessing/pool.py”, line 406 in _handle_workers
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 864 in run
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 916 in _bootstrap_inner
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/threading.py”, line 884 in _bootstrap

Thread 0x00007f2406c83740 (most recent call first):
File “/home/robodoc/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1443 in _call_tf_sessionrun
File “/home/robodoc/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1350 in _run_fn
File “/home/robodoc/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1365 in _do_call
File “/home/robodoc/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1359 in _do_run
File “/home/robodoc/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1180 in _run
File “/home/robodoc/.local/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 956 in run
File “/home/robodoc/Masoud/conda_deepspeech0.9.1/DeepSpeech/training/deepspeech_training/evaluate.py”, line 108 in run_test
File “/home/robodoc/Masoud/conda_deepspeech0.9.1/DeepSpeech/training/deepspeech_training/evaluate.py”, line 132 in evaluate
File “lm_optimizer.py”, line 36 in objective
File “/home/robodoc/.local/lib/python3.6/site-packages/optuna/_optimize.py”, line 189 in _run_trial
File “/home/robodoc/.local/lib/python3.6/site-packages/optuna/_optimize.py”, line 156 in _optimize_sequential
File “/home/robodoc/.local/lib/python3.6/site-packages/optuna/_optimize.py”, line 65 in _optimize
File “/home/robodoc/.local/lib/python3.6/site-packages/optuna/study.py”, line 315 in optimize
File “lm_optimizer.py”, line 62 in main
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/site-packages/absl/app.py”, line 251 in _run_main
File “/home/robodoc/miniconda3/envs/conda_deep_speech/lib/python3.6/site-packages/absl/app.py”, line 303 in run
File “lm_optimizer.py”, line 70 in
Segmentation fault (core dumped)

Thanks for help.

Have you tried to train on the same material that throws the error? This would mean that the audio is causing the error, not the lm_optimizer script. What is that material like (length, number)?

Solved.

I checked CPU threads history while optimizing. and found they are using somehow their maximum potential. and suddenly they become maximum and crash optimizing.

I tried reducing the batch size to 1 and problem solved. But it really takes a long time to optimize.
Since yesterday just 10 Trials are completed.

1 Like

Interesting, can you please post more about how much data you are processing?

1 Like

Please try without conda, we got unreliable reports in the past.

2 Likes