I followed through the installation steps for cpu training deepspech, it succeeded.
(interesting fact that in requirements.txt tensorflow version is not stated, should be 1.3.0?)
Although the command bin/run-ldc93s1.sh failed with message:
+ '[' '!' -f data/ldc93s1/ldc93s1.csv ']'
+ echo 'Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.'
Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
+ python -u bin/import_ldc93s1.py ./data/ldc93s1
Successfully downloaded LDC93S1.wav 93638 bytes.
Successfully downloaded LDC93S1.txt 62 bytes.
+ '[' -d '' ']'
++ python -c 'from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))'
+ checkpoint_dir=/home/f-minkin/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 50 --checkpoint_dir /home/f-minkin/.local/share/deepspeech/ldc93s1
Traceback (most recent call last):
File "DeepSpeech.py", line 482, in <module>
custom_op_module = tf.load_op_library(FLAGS.decoder_library_path)
File "/home/f-minkin/tmp/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/f-minkin/tmp/deepspeech-venv/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: native_client/libctc_decoder_with_kenlm.so: undefined symbol: _ZN10tensorflow10DEVICE_CPUE
+ '[' '!' -f data/ldc93s1/ldc93s1.csv ']'
+ '[' -d '' ']'
++ python -c 'from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))'
+ checkpoint_dir=/home/f-minkin/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 50 --checkpoint_dir /home/f-minkin/.local/share/deepspeech/ldc93s1
2017-11-27 23:07:08.360678: F tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library was compiled to use AVX2 instructions, but these aren't available on your machine.
./bin/run-ldc93s1.sh: line 29: 210003 Aborted python -u DeepSpeech.py --train_files data/ldc93s1/ldc93s1.csv --dev_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --dev_batch_size 1 --test_batch_size 1 --n_hidden 494 --epoch 50 --checkpoint_dir "$checkpoint_dir" "$@"```
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Thanks! So your system does not have AVX2, you will have to build the Python package from https://github.com/mozilla/tensorflow master branch. You can follow instructions in DeepSpeechās README for that
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
3
Can you also share more details about your system ? Are you using some VM on some provider, and the output of cat /proc/cpuinfo so others can find it ? Thanks
Thereās avx but no avx2. So far, only solution is that you rebuild the python tensorflow package, itās should only take some time but with that kind of machine not that much.
What about requirements.txt? Itās stated there ātensorflowā, that has no version, so itāll download 1.4.0? I understand, that itās not necessary due to compilation from sources but anyway, should there be ātensorflow == 1.3.0ā?
P.S. The information about processors setting might be very helpful to others who wants to try it, it would be good thing to add this to readme, in my opinion
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7
It is already documented in the README that we require CPU with at least AVX2 and FMA. Regarding the requirements.txt we should probably fix that in a more proper way. The current issue is that since Tensorflow has no API stability guarantee besides Python, even tensorflow==1.3.0 might get into troubles ; people reported problems with upstream 1.3.0 and out libctc_decoder_with_kenlm.so.
yep, thanks! I got the CUDA-enabled setup working, tho it fails at decoding:
Loading the LM will be faster if you build a binary file.
Reading data/lm/lm.binary
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/read_arpa.cc:65 in void lm::ReadARPACounts(util::FilePiece&, std::vector<long unsigned int>&) threw FormatLoadException.
first non-empty line was "version https://git-lfs.github.com/spec/v1" not \data\. Byte: 43
Aborted (core dumped)```
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
You have not setup git-lfs, so there is no language model, please read the README for this.
Oh, itās in FAQ section and in README, Iām so sorry for not paying full attention to it.
Iāll try to make PR asap, thanks.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
11
FYI Iām covering similar changes for when we switch to r1.4: we will have a task running on TaskCluster, and testing training of the model against upstream tensorflow with our libctc_decoder_with_kenlm.so
I installed tensorflow from source but it appears that I still havenāt successfully added avx2 support. It would be helpful if someone would provide the ./configure and bazel-build commands that will surely work.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
16
If you follow Tensorflowās build instruction, thereās no way you endup with a package that depends on AVX2 if you system does not. Which configure and bazel build command did you use ?
I pip3 installed deepsearch. Then, when I tested it at the command line, the error I got was
āThe TensorFlow library was compiled to use AVX2 instructions, but these arenāt available on your machine.ā
So, I pip3 uninstalled deepsearch and tensorflow. Then, I configured and bazel-built tensorflow. Then, I pip3 installed deepsearch, tried again, got the same avx2 error.
My guess is that Iām not ./configuring or bazel-building correctly so Iām asking what the recommended lines are for no-gpu configuration.
Does this make sense?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
18
It makes sense, but you are doing something different than what was the start of this topic , where the first poster was having issues during training. Basically, any package you will pip install is one we did build with AVX2 support.
In your case, you need to follow the native_client/README.md about building the Python bindings (and also building Tensorflow, it should be all documented there). When doing so, you should have no forced optimization and thus it should run properly.
So, pip uninstall anything you installed earlier, and follow that.
Besides, out of curiosity, what is your CPU / system ?
My confusion was due to the use of multiple README files in the DeepSearch repository. You are referencing instructions in the DeepSearch/native_client/README.me where as I was referencing the installation instructions README in the root directory. I just noticed the root README tells me to check out the other README if installation fails. Oh well.