Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state Segmentation fault (core dumped)

vibha.shahani · August 9, 2019, 11:23am

Hi,

I am trying to train my own model on some 1000 files which is about 1 hour of data.
These are the steps I followed:

git clone https://github.com/mozilla/DeepSpeech.git
This downloads and installs the version 0.6.0-alpha.4

virtualenv -p python3 venv
source venv/bin/activate

pip3 install -r requirements.txt
pip3 install $(python util/taskcluster.py --decoder)

pip3 uninstall tensorflow
pip3 install ‘tensorflow-gpu==1.14.0’ (since i am using gpu)

pip3 install deepspeech-gpu

Run python util/taskcluster.py --arch gpu --target native_client

My corpus contains
corpus-train.csv, corpus-test.csv, corpus-dev.csv and Wav folder contains all wav files.

Creating language model

git clone https://github.com/kpu/kenlm.git

cd kenlm/

mkdir build

cd build

cmake …

make -j 4

creating a new directory called my-model and put the language model files in this directory.
vim alphabet.txt

vim some.txt (This is corpus)

…/kenlm/build/bin/lmplz -o 5 <some.txt >lm.arpa

…/kenlm/build/bin/build_binary lm.arpa lm.binary

…/DeepSpeech/native_client/generate_trie alphabet.txt lm.binary trie

Then I run this script
nohup python3 -u DeepSpeech.py
–train_files “/home/dev_ds/deepspeech_test/corpus/corpus-train.csv”
–dev_files “/home/dev_ds/deepspeech_test/corpus/corpus-dev.csv”
–test_files “/home/dev_ds/deepspeech_test/corpus/corpus-test.csv”
–alphabet_config_path “/home/dev_ds/deepspeech_test/my-model/alphabet.txt”
–lm_binary_path “/home/dev_ds/deepspeech_test/my-model/lm.binary”
–lm_trie_path “/home/dev_ds/deepspeech_test/my-model/trie”
–learning_rate 0.001
–dropout_rate 0.05
–word_count_weight 3.5
–train_batch_size 4
–log_level 1
–display_step 1
–checkpoint_dir /home/dev_ds/.local/share/deepspeech/checkpoints2/
–epoch 75
–export_dir “/home/dev_ds/deepspeech_test/my-model”
&>> background1.log &
The training gets completed and the output_graph.pb file gets exported in my-model folder.

After that I try to give an audio to check the transcript for the same.
I do that using the command:

deepspeech --model output_graph.pb --alphabet alphabet.txt --lm lm.binary --trie trie --audio /home/dev_ds/CMUSphinx_mix/database/wav/PBE/PBE13881chunk10.wav

After running this I get the following error:

Loading model from file output_graph.pb
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.1-0-g4b29b78
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-08-09 16:03:23.375566: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-09 16:03:23.486778: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-09 16:03:23.487394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: GeForce GT 730 major: 3 minor: 5 memoryClockRate(GHz): 0.9015
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 135.12MiB
2019-08-09 16:03:23.487439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-08-09 16:03:24.116975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-09 16:03:24.117013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2019-08-09 16:03:24.117025: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2019-08-09 16:03:24.117201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 95 MB memory) -> physical GPU (device: 0, name: GeForce GT 730, pci bus id: 0000:01:00.0, compute capability: 3.5)
2019-08-09 16:03:24.173693: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel (‘op: “UnwrapDatasetVariant” device_type: “CPU”’) for unknown op: UnwrapDatasetVariant
2019-08-09 16:03:24.173737: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel (‘op: “WrapDatasetVariant” device_type: “GPU” host_memory_arg: “input_handle” host_memory_arg: “output_handle”’) for unknown op: WrapDatasetVariant
2019-08-09 16:03:24.173766: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel (‘op: “WrapDatasetVariant” device_type: “CPU”’) for unknown op: WrapDatasetVariant
2019-08-09 16:03:24.173903: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel (‘op: “UnwrapDatasetVariant” device_type: “GPU” host_memory_arg: “input_handle” host_memory_arg: “output_handle”’) for unknown op: UnwrapDatasetVariant
Loaded model in 0.801s.
Loading language model from files lm.binary trie
Error: Trie file version mismatch (4 instead of expected 3). Update your trie file.
Loaded language model in 0.000412s.
Running inference.
2019-08-09 16:03:24.188478: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-08-09 16:03:24.225981: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-08-09 16:03:24.302261: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-08-09 16:03:24.302321: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
2019-08-09 16:03:24.521761: W tensorflow/core/framework/allocator.cc:124] Allocation of 134217728 exceeds 10% of system memory.
Error running session: Not found: PruneForTargets: Some target nodes not found: initialize_state
Segmentation fault (core dumped)

Kindly let me know how to solve this.

lissyx · August 10, 2019, 8:02am

@vibha.shahani You already opened an issue on Github, can you please avoid cross-posting ? You won’t get more help by spamming us.

And @reuben already explained you what to do: https://github.com/mozilla/DeepSpeech/issues/2295#issuecomment-519884682

adrian.cousot · October 3, 2019, 4:23pm

It does not work for me even having installed the 0.6.0-alpha.8. I got the same error

lissyx · October 4, 2019, 8:53am

Can you be more specific ? What are you doing, what files are you running, etc. ?