Error after following installation steps

I followed through the installation steps for cpu training deepspech, it succeeded.
(interesting fact that in requirements.txt tensorflow version is not stated, should be 1.3.0?)
Although the command bin/run-ldc93s1.sh failed with message:

+ '[' '!' -f data/ldc93s1/ldc93s1.csv ']'
+ echo 'Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.'
Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
+ python -u bin/import_ldc93s1.py ./data/ldc93s1
Successfully downloaded LDC93S1.wav 93638 bytes.
Successfully downloaded LDC93S1.txt 62 bytes.
+ '[' -d '' ']'
++ python -c 'from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))'
+ checkpoint_dir=/home/f-minkin/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 50 --checkpoint_dir /home/f-minkin/.local/share/deepspeech/ldc93s1
Traceback (most recent call last):
  File "DeepSpeech.py", line 482, in <module>
    custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
  File "/home/f-minkin/tmp/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
  File "/home/f-minkin/tmp/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: native_client/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow10DEVICE_CPUE

Moreover if I follow the gpu instructions:

python util/taskcluster.py --target /tmp --source tensorflow --arch gpu --artifact tensorflow_gpu_warpctc-1.3.0rc0-cp27-cp27mu-linux_x86_64.whl
pip install /tmp/tensorflow_gpu_warpctc-1.3.0rc0-cp27-cp27mu-linux_x86_64.whl

I got tf exception:

+ '[' '!' -f data/ldc93s1/ldc93s1.csv ']'
+ '[' -d '' ']'
++ python -c 'from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))'
+ checkpoint_dir=/home/f-minkin/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 50 --checkpoint_dir /home/f-minkin/.local/share/deepspeech/ldc93s1
2017-11-27 23:07:08.360678: F tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.
./bin/run-ldc93s1.sh: line 29: 210003 Aborted                 python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 50 --checkpoint_dir "$checkpoint_dir" "$@"```

Thanks! So your system does not have AVX2, you will have to build the Python package from https://github.com/mozilla/tensorflow master branch. You can follow instructions in DeepSpeechā€™s README for that :slight_smile:

Can you also share more details about your system ? Are you using some VM on some provider, and the output of cat /proc/cpuinfo so others can find it ? Thanks :slight_smile:

My machine has 32 cores, so itā€™s a bit of a flood, Iā€™ll post link to pastebin:
Link

Thanks, it confirms:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt

Thereā€™s avx but no avx2. So far, only solution is that you rebuild the python tensorflow package, itā€™s should only take some time but with that kind of machine not that much.

Thanks a lot!

What about requirements.txt? Itā€™s stated there ā€œtensorflowā€, that has no version, so itā€™ll download 1.4.0? I understand, that itā€™s not necessary due to compilation from sources but anyway, should there be ā€˜tensorflow == 1.3.0ā€™?

P.S. The information about processors setting might be very helpful to others who wants to try it, it would be good thing to add this to readme, in my opinion

It is already documented in the README that we require CPU with at least AVX2 and FMA. Regarding the requirements.txt we should probably fix that in a more proper way. The current issue is that since Tensorflow has no API stability guarantee besides Python, even tensorflow==1.3.0 might get into troubles ; people reported problems with upstream 1.3.0 and out libctc_decoder_with_kenlm.so.

You can probably open an issue and make a PR to change that. We could use the TaskCluster link: https://index.taskcluster.net/v1/task/project.deepspeech.tensorflow.pip.master.gpu/artifacts/public/tensorflow_gpu_warpctc-1.3.0rc0-cp27-cp27mu-linux_x86_64.whl

This would assume that people wanting to train have a CUDA-enabled setup (which is probably fine). Happy to review your issue and PR on that :slight_smile:

yep, thanks! I got the CUDA-enabled setup working, tho it fails at decoding:

Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43
Aborted (core dumped)```

You have not setup git-lfs, so there is no language model, please read the README for this.

Oh, itā€™s in FAQ section and in README, Iā€™m so sorry for not paying full attention to it.

Iā€™ll try to make PR asap, thanks.

FYI Iā€™m covering similar changes for when we switch to r1.4: we will have a task running on TaskCluster, and testing training of the model against upstream tensorflow with our libctc_decoder_with_kenlm.so :slight_smile:

Yeah, I saw it in Issues :slight_smile:

Iā€™ve also run into the AVX2 error and do not see avx2 in my procinfo flags. I assume I also need to rebuild the python tensorflow package.

My setup:

  • OS: Debian 9
    -Python: python 3.6, built from source

BTW-- DeepSpeech is python3 compatible, right?

Yes, it should all run properly with Python 3.

1 Like

I installed tensorflow from source but it appears that I still havenā€™t successfully added avx2 support. It would be helpful if someone would provide the ./configure and bazel-build commands that will surely work.

If you follow Tensorflowā€™s build instruction, thereā€™s no way you endup with a package that depends on AVX2 if you system does not. Which configure and bazel build command did you use ?

Going back to the start.

I pip3 installed deepsearch. Then, when I tested it at the command line, the error I got was

ā€œThe TensorFlow library was compiled to use AVX2 instructions, but these arenā€™t available on your machine.ā€

So, I pip3 uninstalled deepsearch and tensorflow. Then, I configured and bazel-built tensorflow. Then, I pip3 installed deepsearch, tried again, got the same avx2 error.

My guess is that Iā€™m not ./configuring or bazel-building correctly so Iā€™m asking what the recommended lines are for no-gpu configuration.

Does this make sense?

It makes sense, but you are doing something different than what was the start of this topic :slight_smile:, where the first poster was having issues during training. Basically, any package you will pip install is one we did build with AVX2 support.

In your case, you need to follow the native_client/README.md about building the Python bindings (and also building Tensorflow, it should be all documented there). When doing so, you should have no forced optimization and thus it should run properly.

So, pip uninstall anything you installed earlier, and follow that.

Besides, out of curiosity, what is your CPU / system ?

Apologies for the confusion! I will try as recommended.

CPU: IntelĀ® Coreā„¢ i7-3770 CPU @ 3.40GHz
OS: Debian 9

1 Like

My confusion was due to the use of multiple README files in the DeepSearch repository. You are referencing instructions in the DeepSearch/native_client/README.me where as I was referencing the installation instructions README in the root directory. I just noticed the root README tells me to check out the other README if installation fails. Oh well.