I followed the documentation for building the environment step by step:
After running the training i get to the following error
STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Traceback (most recent call last):
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
target_list, run_metadata)
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node tower_0/conv1d}}]]
[[tower_0/gradients/tower_0/BiasAdd_3_grad/BiasAddGrad/_131]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node tower_0/conv1d}}]]
0 successful operations.
1 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./DeepSpeech.py", line 12, in <module>
ds_train.run_script()
File "/home/ubuntu/ds/lib/python3.6/site-packages/DeepSpeech/training/deepspeech_training/train.py", line 939, in run_script
absl.app.run(main)
File "/home/ubuntu/ds/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/ubuntu/ds/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/ubuntu/ds/lib/python3.6/site-packages/DeepSpeech/training/deepspeech_training/train.py", line 911, in main
train()
File "/home/ubuntu/ds/lib/python3.6/site-packages/DeepSpeech/training/deepspeech_training/train.py", line 589, in train
train_loss, _ = run_set('train', epoch, train_init_op)
File "/home/ubuntu/ds/lib/python3.6/site-packages/DeepSpeech/training/deepspeech_training/train.py", line 549, in run_set
feed_dict=feed_dict)
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
run_metadata_ptr)
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
(0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node tower_0/conv1d (defined at /home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[tower_0/gradients/tower_0/BiasAdd_3_grad/BiasAddGrad/_131]]
(1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node tower_0/conv1d (defined at /home/ubuntu/ds/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
1 derived errors ignored.
Original stack trace for 'tower_0/conv1d':
I Tried manually setting the TF_FORCE_GPU_ALLOW_GROWTH to true
I tried adding os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
to top of DeepSpeech.py (of course importing OS)
Did i miss something during the process ?
Steps: After cloning DeepSpeech and mozilla TensorFlow
- pip install --upgrade pip==20.0.2 wheel==0.34.2 setuptools==46.1.3
- pip install --upgrade --force-reinstall -e .
- pip uninstall tensorflow -y
- pip install ‘tensorflow-gpu==1.15.2’
- python3 generate_lm.py --input_txt …/vocabulary.txt --output_dir output/ --top_k 5000000 --kenlm_bins …/…/…/build/bin/ --arpa_order 3 --max_arpa_memory “85%” --arpa_prune “0|0|1” --binary_a_bits 255 --binary_q_bits 8 --binary_type trie
- python3 generate_package.py --alphabet …/alphabet.txt --lm output/lm.binary --vocab output/vocab-5000000.txt --package kenlm.scorer --default_alpha 0.931289039105002 --default_beta 1.1834137581510284