Trying to train own model using native_client getting error - libctc_decoder_with_kenlm.so: undefined symbol


#1

A little background: I wasn’t able to install deepspeech-gpu using the pip installation method but I was able to build tensorflow and deepspeech from source. I’m trying to train my own model so I cloned the deepspeech repo and then used:

./utils/taskcluster.py --target ./native_client --arch gpu

Downloading https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.gpu/artifacts/public/native_client.tar.xz
Downloading: 100%

generate_trie
libctc_decoder_with_kenlm.so
libdeepspeech.so
libdeepspeech_utils.so
LICENSE
deepspeech
README.mozilla

But when I do:

./Deepspeech.py --decoder_library_path ./native_client/libctc_decoder_with_kenlm.so

I get the following error:

tensorflow.python.framework.errors_impl.NotFoundError: ./libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

My tensorflow version is 1.5.0-rc1

Cuda version:

nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

My GPU is Quadro P1000

I’m able to use the pretrained models and I can see it is using the GPU. When I was looking around it said I might have to update my deepspeech version but I’m afraid to do it because it took me a while to get the deepspeech-gpu stuff working. Any help is appreciated! Thanks.


(Lissyx) #2

If you are training, you don’t need to build yourself. Just pip install tensorflow-gpu as documented https://github.com/mozilla/DeepSpeech/blob/master/README.md#installing-prerequisites-for-training

Currently, this should get your tensorflow-gpu==1.5.0 and not a 1.5.0-rc1, and this should work properly with the libctc_decoder_with_kenlm.so that you downloaded.


#3

I was able to get tensor flow to 1.5.0 and I’m still having the issue.

Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.

import tensorflow as tf
print(tf.version)
1.5.0


(Lissyx) #4

I am sorry to insist, but please verify properly your setup and make sure you are not mixing virtualenv and libs. I have tensorflow-gpu==1.5.0 installed and libctc_decoder_with_kenlm.so works here.


(Lissyx) #5
(tf-venv-master) $ python -u DeepSpeech.py --decoder_library_path /home/alexandre/tmp/ds-gpu/libctc_decoder_with_kenlm.so  --train_files data/ldc93s1/ldc93s1.csv   --dev_files data/ldc93s1/ldc93s1.csv   --test_files data/ldc93s1/ldc93s1.csv   --train_batch_size 1   --dev_batch_size 1   --test_batch_size 1   --n_hidden 1 --epoch 1 --learning_rate 0.01 --checkpoint_dir test-training/ --export_dir test-training/
W Parameter --validation_step needs to be >0 for early stopping to work
------------------------------------------------------------------------
WARNING: libdeepspeech failed to load, resorting to deprecated code
         Refer to README.md for instructions on installing libdeepspeech
------------------------------------------------------------------------
I STARTING Optimization
I Training of Epoch 0 - loss: 356.919312
I FINISHED Optimization - training time: 0:00:00
I Test of Epoch 1 - WER: 1.000000, loss: 355.41058349609375, mean edit distance: 1.000000
I --------------------------------------------------------------------------------
I WER: 1.000000, loss: 355.410583, mean edit distance: 1.000000
I  - src: "she had your dark suit in greasy wash water all year"
I  - res: "i dehydratedededededededededededededededededededededededede"
I --------------------------------------------------------------------------------
I Exporting the model...
Converted 14 variables to const ops.
I Models exported at test-training/
(tf-venv-master) $ pip list | grep tensorflow
DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning.
tensorflow-gpu (1.5.0)
tensorflow-tensorboard (1.5.1)
$ sha256sum ~/tmp/ds-gpu/libctc_decoder_with_kenlm.so 
d269cc9a304baaa09ced0e9b4a93e864056ecee979e04ec36116f44011488876  /home/alexandre/tmp/ds-gpu/libctc_decoder_with_kenlm.so

#7

Thanks again for the help! To updated it I did the following:

$ pip install --upgrade --user tensorflow-gpu

Below is what I got when I ran the command you suggested.

username@server:~/Audio/DeepSpeech$ python -u ./DeepSpeech.py --decoder_library_path ./native_client/libctc_decoder_with_kenlm.so --train_files ./data/ldc93s1/ldc93s1.csv --dev_files ./data/ldc93s1/ldc93s1.csv --test_files ./data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --n_hidden 1 --epoch 1 --learning_rate 0.01 --checkpoint_dir ./check_dir/ --export_dir ./export_dir/
Traceback (most recent call last):
  File "./DeepSpeech.py", line 1838, in <module>
    tf.app.run()
  File "/home/username/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 124, in run
    _sys.exit(main(argv))
  File "./DeepSpeech.py", line 1790, in main
    initialize_globals()
  File "./DeepSpeech.py", line 334, in initialize_globals
    custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
  File "/home/username/.local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
  File "/home/username/.local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: ./native_client/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringEv

(Lissyx) #8

Please, I’ve asked you to verify the sha256 of your library. Also, make sure your install has not updated silently to tensorflow-gpu 1.6.0-rcX …


(Megha ) #9

@lissyx, I am facing this problem when I am trying to train my model. below is the command I am giving for training.

python -u DeepSpeech.py
–checkpoint_dir checkpoints1
–checkpoint_step 1
–decoder_library_path libctc_decoder_with_kenlm.so
–dropout_rate 0.2367
–default_stddev 0.046875
–epoch 13
–export_dir /my_exportdir/model.pb
–train_files CVD/cv-valid-train.csv,CVD/cv-other-train.csv
–dev_files CVD/cv-valid-dev.csv
–test_files CVD/cv-valid-test.csv
–train_batch_size 12
–dev_batch_size 8
–test_batch_size 8 \
–learning_rate 0.0001
–display_step 0
–validation_step 1
–log_level 0
–summary_dir summary3
–summary_secs 60
–initialize_from_frozen_model models/output_graph.pb

$ pip list | grep tensorflow
tensorflow (1.5.0)
tensorflow-gpu (1.4.0)
tensorflow-tensorboard (1.5.1)

Traceback (most recent call last):
File “DeepSpeech.py”, line 1838, in
tf.app.run()
File “/home/megha/Alu_Meg/DeepSpeech_Alug_Meg/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py”, line 124, in run
_sys.exit(main(argv))
File “DeepSpeech.py”, line 1790, in main
initialize_globals()
File “DeepSpeech.py”, line 334, in initialize_globals
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File “/home/megha/Alu_Meg/DeepSpeech_Alug_Meg/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py”, line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File “/home/megha/Alu_Meg/DeepSpeech_Alug_Meg/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py”, line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: libctc_decoder_with_kenlm.so: cannot open shared object file: No such file or directory

could you please help me if I am missing some thing.
Thanks :slight_smile:


(Lissyx) #10

I’m sorry, but I don’t see how I can help, python is already pretty obvious regarding your issue: cannot open shared object file: No such file or directory. The path is invalid.


(Megha ) #11

@lissyx, I see 2 instances of libctc_decoder_with_kenlm.so.

  1. in Deepspeech floder
  2. in the Deepspeech/native_client (I have placed native_client from python util/taskcluster.py --target Deepspeech --arch gpu in Deepspeech floder ).
    So which one should I give in my command while runing Deepspeech?

And I have I more question. I get this error from long time. I also saw some comments on this. some tell this is just a warning. Is it true? or will this cause any problem when I am training ?
Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA


(Megha ) #12

@lissyx, I think I clarified my doubt now. with python -u DeepSpeech.py -help

–decoder_library_path: path to the libctc_decoder_with_kenlm.so library containing the decoder implementation.
(default: ‘native_client/libctc_decoder_with_kenlm.so’)

So We should use the one inside the native_client. So now by doing that I get,

tensorflow.python.framework.errors_impl.NotFoundError: native_client/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow20OpKernelConstruction21CtxFailureWithWarningEPKciRKNS_6StatusE

could you comment on this please ?

Thanks :slight_smile:


(Lissyx) #13

From the warning that tensorflow shows, it means you have built yourself without any optimization. Please refer to upstream tensorflow documentation on how to enable up-to AVX2/FMA.

Your other error is also typical of a misbuild: please document how you built, using which source and commands.


(Megha ) #14

@lissyx, I saw your comments on this(https://github.com/mozilla/DeepSpeech/issues/1223) issue. you mentioned That’s because you have not switched to TensorFlow 1.6.0-rc0 :slight_smile:

for undefined symbol: ZN10tensorflow20OpKernelConstruction21CtxFailureWithWarningEPKciRKNS_6StatusE error.

So should I also switch ?


(Lissyx) #15

This comment might not be valid in your case. But again, without knowing what you did, I cannot help more.


(Lissyx) #16

I also see @meghagowda5193 that you have both tensorflow and tensorflow-gpu in the same python env, no idea how this can end …