Which Tensorflow, cuda

Abhinav · April 9, 2019, 1:51pm

I am using tensorflow 1.12 with CUDNN7.5 and CUDA 9.0 on an ubuntu 16.04. Upon running run-ldc93s1.sh, I get the following error. Which version combinations should I use for smooth operation?

(venv) root@asr:~/DeepSpeech# ./bin/run-ldc93s1.sh
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ echo Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
+ python -u bin/import_ldc93s1.py ./data/ldc93s1
No path "./data/ldc93s1" - creating ...
No archive "./data/ldc93s1/LDC93S1.wav" - downloading...
Progress |                                                                                                                                                                                 | N/A% completedNo archive "./data/ldc93s1/LDC93S1.txt" - downloading...
Progress |#################################################################################################################################################################################| 100% completed
Progress |#################################################################################################################################################################################| 100% completed
+ [ -d  ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
+ checkpoint_dir=/root/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /root/.local/share/deepspeech/ldc93s1
Traceback (most recent call last):
  File "DeepSpeech.py", line 833, in <module>
    tf.app.run(main)
  File "/root/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "DeepSpeech.py", line 817, in main
    train()
  File "DeepSpeech.py", line 369, in train
    cache_path=FLAGS.train_cached_features_path)
  File "/root/DeepSpeech/util/feeding.py", line 92, in create_dataset
    .map(entry_to_features, num_parallel_calls=tf.data.experimental.AUTOTUNE)
AttributeError: module 'tensorflow._api.v1.data.experimental' has no attribute 'AUTOTUNE'

reuben · April 9, 2019, 1:55pm

You need to use TensorFlow 1.13.1, which requires CUDA 10 by default.

Abhinav · April 9, 2019, 1:57pm

Thanks reuben for the quick response.

Abhinav · April 9, 2019, 1:58pm

Not the latest CUDA 10.1?

reuben · April 9, 2019, 2:03pm

https://www.tensorflow.org/install/gpu#software_requirements

Abhinav · April 9, 2019, 2:50pm

I reinstalled ubuntu 18.04, cuda 10, cudnn7.5 and tf==1.13. It seems to work, however I get the following error, Do you think this might have something to do with faulty installation?

(venv) root@asr:~/DeepSpeech# ./bin/run-ldc93s1.sh 
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ [ -d  ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
+ checkpoint_dir=/root/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /root/.local/share/deepspeech/ldc93s1
WARNING:tensorflow:From /root/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    
WARNING:tensorflow:From /root/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /root/venv/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
I Initializing variables...
I STARTING Optimization
I Training epoch 0...
Traceback (most recent call last):
  File "DeepSpeech.py", line 833, in <module>
    tf.app.run(main)
  File "/root/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "DeepSpeech.py", line 817, in main
    train()
  File "DeepSpeech.py", line 511, in train
    train_loss = run_set('train', train_init_op)
  File "DeepSpeech.py", line 501, in run_set
    return total_loss / step_count
ZeroDivisionError: float division by zero

reuben · April 9, 2019, 3:59pm

Looks like your train dataset doesn’t have any files in it? Double check the contents of your data/ldc93s1/ldc93s1.csv file.

Abhinav · April 11, 2019, 11:08am

The files were there, strangely enough when I ran the script again with CPU backed tensorflow it worked fine. Reverting back to tensorflow-gpu gave this error again. DId this a few times and it happens everytime. Will investigate more and report on this thread