I am using tensorflow 1.12 with CUDNN7.5 and CUDA 9.0 on an ubuntu 16.04. Upon running run-ldc93s1.sh, I get the following error. Which version combinations should I use for smooth operation?
(venv) root@asr:~/DeepSpeech# ./bin/run-ldc93s1.sh
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ echo Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
Downloading and preprocessing LDC93S1 example data, saving in ./data/ldc93s1.
+ python -u bin/import_ldc93s1.py ./data/ldc93s1
No path "./data/ldc93s1" - creating ...
No archive "./data/ldc93s1/LDC93S1.wav" - downloading...
Progress | | N/A% completedNo archive "./data/ldc93s1/LDC93S1.txt" - downloading...
Progress |#################################################################################################################################################################################| 100% completed
Progress |#################################################################################################################################################################################| 100% completed
+ [ -d ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
+ checkpoint_dir=/root/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /root/.local/share/deepspeech/ldc93s1
Traceback (most recent call last):
File "DeepSpeech.py", line 833, in <module>
tf.app.run(main)
File "/root/venv/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 817, in main
train()
File "DeepSpeech.py", line 369, in train
cache_path=FLAGS.train_cached_features_path)
File "/root/DeepSpeech/util/feeding.py", line 92, in create_dataset
.map(entry_to_features, num_parallel_calls=tf.data.experimental.AUTOTUNE)
AttributeError: module 'tensorflow._api.v1.data.experimental' has no attribute 'AUTOTUNE'
I reinstalled ubuntu 18.04, cuda 10, cudnn7.5 and tf==1.13. It seems to work, however I get the following error, Do you think this might have something to do with faulty installation?
(venv) root@asr:~/DeepSpeech# ./bin/run-ldc93s1.sh
+ [ ! -f DeepSpeech.py ]
+ [ ! -f data/ldc93s1/ldc93s1.csv ]
+ [ -d ]
+ python -c from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))
+ checkpoint_dir=/root/.local/share/deepspeech/ldc93s1
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /root/.local/share/deepspeech/ldc93s1
WARNING:tensorflow:From /root/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
WARNING:tensorflow:From /root/venv/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /root/venv/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
I Initializing variables...
I STARTING Optimization
I Training epoch 0...
Traceback (most recent call last):
File "DeepSpeech.py", line 833, in <module>
tf.app.run(main)
File "/root/venv/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "DeepSpeech.py", line 817, in main
train()
File "DeepSpeech.py", line 511, in train
train_loss = run_set('train', train_init_op)
File "DeepSpeech.py", line 501, in run_set
return total_loss / step_count
ZeroDivisionError: float division by zero
The files were there, strangely enough when I ran the script again with CPU backed tensorflow it worked fine. Reverting back to tensorflow-gpu gave this error again. DId this a few times and it happens everytime. Will investigate more and report on this thread