CUDA cuDNN Version Help

Hello all,

I’ve followed the instructions on the documentation but I’m still getting the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode=“linear_input”, direction=“unidirectional”, rnn_mode=“lstm”, seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device=‘GPU’; T in [DT_DOUBLE]
device=‘GPU’; T in [DT_FLOAT]
device=‘GPU’; T in [DT_HALF]

DeepSpeech version pulled directly from github as per the documentation.
CUDA 10.1
cuDNN 7.6.5
tensorflow-gpu-1.15.2
(I’ve set everything up as per the documentation v0.8)

According to the tensorflow site https://www.tensorflow.org/install/source#tested_build_configurations
it says that tensorflow-gpu 1.15.0 uses CUDA 10.0 and cuDNN 7.4
So, I have also tried these versions but to no avail.

Am I doing something wrong? Please let me know if I need to include any more information.

Thank you!
Matthew

Yes, you are not sharing logs of CUDA/CUDNN loading steps from TensorFlow.

Try 7.6?

Thanks for your response!

I am running cuDNN 7.6.5 and still the same error.

I’m not sure what you mean by logs of CUDA/CUDNN loading steps. The entire error message is this:

Traceback (most recent call last):
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 961, in run_script
    absl.app.run(main)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 933, in main
    train()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 523, in train
    load_or_init_graph_for_training(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 111, in _load_or_init_impl
    return _initialize_all_variables(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 87, in _initialize_all_variables
    session.run(v.initializer)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

--log_level=0 and share full content.

Ah, thank you. Here is the full log.

2020-08-05 18:56:25.519628: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-05 18:56:25.542763: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299965000 Hz
2020-08-05 18:56:25.543163: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b102d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-05 18:56:25.543198: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-05 18:56:25.544643: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-05 18:56:25.659081: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:25.659409: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ba05b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-05 18:56:25.659425: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro RTX 4000, Compute Capability 7.5
2020-08-05 18:56:25.659532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:25.659767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2020-08-05 18:56:25.659827: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659860: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659889: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659919: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659946: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659973: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.662076: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:25.662087: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-08-05 18:56:25.662098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-05 18:56:25.662103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-08-05 18:56:25.662106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-08-05 18:56:26.346453: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:26.346856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2020-08-05 18:56:26.346959: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347016: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347067: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347115: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347165: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347214: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:26.347241: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W0805 18:56:26.758899 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W0805 18:56:26.759217 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W0805 18:56:26.759424 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0805 18:56:26.957284 139669798070080 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0805 18:56:26.959954 139669798070080 deprecation.py:506] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0805 18:56:26.960354 139669798070080 deprecation.py:506] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/matthew/github/DeepSpeech/training/deepspeech_training/train.py:245: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0805 18:56:27.087682 139669798070080 deprecation.py:323] From /home/matthew/github/DeepSpeech/training/deepspeech_training/train.py:245: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0805 18:56:27.440263 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2020-08-05 18:56:27.576252: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:27.576533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2020-08-05 18:56:27.576629: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576712: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576787: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576847: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576907: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576968: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:27.576982: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-08-05 18:56:27.577017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-05 18:56:27.577022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-08-05 18:56:27.577026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
D Session opened.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
Traceback (most recent call last):
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [seed=4568, dropout=0, num_params=8, input_mode="linear_input", T=DT_FLOAT, direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 961, in run_script
    absl.app.run(main)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 933, in main
    train()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 523, in train
    load_or_init_graph_for_training(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 111, in _load_or_init_impl
    return _initialize_all_variables(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 87, in _initialize_all_variables
    session.run(v.initializer)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [seed=4568, dropout=0, num_params=8, input_mode="linear_input", T=DT_FLOAT, direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

So your CUDA/CUDNN setup is wrong, it’s not loading it.

Yes, I see that it’s looking specifically for 10.0? So does that mean I should be running CUDA 10.0 rather than 10.1 as in the documentation?

The documentation mentions deps for inference and links to TensorFlow website for training deps because those are from TensorFlow: 10.0/7.6 is what you need with r1.15.

So, I don’t know how you setup things, but your tensorflow-gpu does not find CUDA 10.0.

Ok, so it’s the CUDA part that is not right, but it should be 10.0?

Yes, as I said above.

Ok, I am running CUDA 10.0 and now CUDNN 7.5. I can see the files the program is looking for in usr/lib/cuda. Is this the right location for it? I’m not sure how to go about fixing this if CUDA 10.0 is installed.

I can’t tell, it depends on your setup.

Ah, I believe I found the issue. Needed to export LD_LIBRARY_PATH. Maybe usr/lib isn’t the normal path for CUDA? I see lots of references saying it should be usr/local or something, but updating the LD_LIBRARY_PATH seems to have solved those errors.

1 Like

Again, it depends on your distro, how you installed CUDA, etc. Maybe stale ldconfig cache?

Yes, maybe. I’ve put the expot LD_LIBRARY_PATH command in the activate file of the virtual environment for now. Thanks so much for your help!

1 Like