CUDA cuDNN Version Help

Matthew_Tan · August 5, 2020, 9:58pm

Hello all,

I’ve followed the instructions on the documentation but I’m still getting the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [dropout=0, seed=4568, num_params=8, T=DT_FLOAT, input_mode=“linear_input”, direction=“unidirectional”, rnn_mode=“lstm”, seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device=‘GPU’; T in [DT_DOUBLE]
device=‘GPU’; T in [DT_FLOAT]
device=‘GPU’; T in [DT_HALF]

DeepSpeech version pulled directly from github as per the documentation.
CUDA 10.1
cuDNN 7.6.5
tensorflow-gpu-1.15.2
(I’ve set everything up as per the documentation v0.8)

According to the tensorflow site https://www.tensorflow.org/install/source#tested_build_configurations
it says that tensorflow-gpu 1.15.0 uses CUDA 10.0 and cuDNN 7.4
So, I have also tried these versions but to no avail.

Am I doing something wrong? Please let me know if I need to include any more information.

Thank you!
Matthew

lissyx · August 5, 2020, 10:39pm

Yes, you are not sharing logs of CUDA/CUDNN loading steps from TensorFlow.

Try 7.6?

Matthew_Tan · August 5, 2020, 10:47pm

Thanks for your response!

I am running cuDNN 7.6.5 and still the same error.

I’m not sure what you mean by logs of CUDA/CUDNN loading steps. The entire error message is this:

Traceback (most recent call last):
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 961, in run_script
    absl.app.run(main)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 933, in main
    train()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 523, in train
    load_or_init_graph_for_training(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 111, in _load_or_init_impl
    return _initialize_all_variables(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 87, in _initialize_all_variables
    session.run(v.initializer)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode="linear_input", direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

lissyx · August 5, 2020, 10:50pm

--log_level=0 and share full content.

Matthew_Tan · August 5, 2020, 10:57pm

Ah, thank you. Here is the full log.

2020-08-05 18:56:25.519628: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-08-05 18:56:25.542763: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2299965000 Hz
2020-08-05 18:56:25.543163: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b102d0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-05 18:56:25.543198: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-05 18:56:25.544643: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-08-05 18:56:25.659081: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:25.659409: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4ba05b0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-08-05 18:56:25.659425: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Quadro RTX 4000, Compute Capability 7.5
2020-08-05 18:56:25.659532: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:25.659767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2020-08-05 18:56:25.659827: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659860: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659889: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659919: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659946: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.659973: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:25.662076: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:25.662087: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-08-05 18:56:25.662098: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-05 18:56:25.662103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-08-05 18:56:25.662106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
2020-08-05 18:56:26.346453: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:26.346856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2020-08-05 18:56:26.346959: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347016: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347067: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347115: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347165: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347214: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:26.347241: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W0805 18:56:26.758899 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:347: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W0805 18:56:26.759217 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:348: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W0805 18:56:26.759424 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py:350: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
WARNING:tensorflow:
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0805 18:56:26.957284 139669798070080 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0805 18:56:26.959954 139669798070080 deprecation.py:506] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:342: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0805 18:56:26.960354 139669798070080 deprecation.py:506] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/contrib/cudnn_rnn/python/layers/cudnn_rnn.py:345: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /home/matthew/github/DeepSpeech/training/deepspeech_training/train.py:245: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0805 18:56:27.087682 139669798070080 deprecation.py:323] From /home/matthew/github/DeepSpeech/training/deepspeech_training/train.py:245: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0805 18:56:27.440263 139669798070080 deprecation.py:323] From /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py:193: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
2020-08-05 18:56:27.576252: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:983] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-05 18:56:27.576533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties: 
name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.38
pciBusID: 0000:01:00.0
2020-08-05 18:56:27.576629: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576712: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576787: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576847: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576907: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576968: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:27.576977: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:27.576982: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-08-05 18:56:27.577017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-08-05 18:56:27.577022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 
2020-08-05 18:56:27.577026: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N 
D Session opened.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
Traceback (most recent call last):
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [seed=4568, dropout=0, num_params=8, input_mode="linear_input", T=DT_FLOAT, direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 961, in run_script
    absl.app.run(main)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 933, in main
    train()
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/train.py", line 523, in train
    load_or_init_graph_for_training(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 111, in _load_or_init_impl
    return _initialize_all_variables(session)
  File "/home/matthew/github/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 87, in _initialize_all_variables
    session.run(v.initializer)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNCanonicalToParams' used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /home/matthew/venv/deepspeech/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [seed=4568, dropout=0, num_params=8, input_mode="linear_input", T=DT_FLOAT, direction="unidirectional", rnn_mode="lstm", seed2=247]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'; T in [DT_DOUBLE]
  device='GPU'; T in [DT_FLOAT]
  device='GPU'; T in [DT_HALF]

	 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

lissyx · August 5, 2020, 10:58pm

Matthew_Tan:

2020-08-05 18:56:26.346959: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347016: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347067: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347115: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347165: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347214: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2020-08-05 18:56:26.347234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-08-05 18:56:26.347241: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1662] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

So your CUDA/CUDNN setup is wrong, it’s not loading it.

Matthew_Tan · August 5, 2020, 11:00pm

Yes, I see that it’s looking specifically for 10.0? So does that mean I should be running CUDA 10.0 rather than 10.1 as in the documentation?

lissyx · August 5, 2020, 11:01pm

The documentation mentions deps for inference and links to TensorFlow website for training deps because those are from TensorFlow: 10.0/7.6 is what you need with r1.15.

lissyx · August 5, 2020, 11:01pm

So, I don’t know how you setup things, but your tensorflow-gpu does not find CUDA 10.0.

Matthew_Tan · August 5, 2020, 11:03pm

Ok, so it’s the CUDA part that is not right, but it should be 10.0?

lissyx · August 5, 2020, 11:04pm

Yes, as I said above.

Matthew_Tan · August 5, 2020, 11:08pm

Ok, I am running CUDA 10.0 and now CUDNN 7.5. I can see the files the program is looking for in usr/lib/cuda. Is this the right location for it? I’m not sure how to go about fixing this if CUDA 10.0 is installed.

lissyx · August 5, 2020, 11:10pm

I can’t tell, it depends on your setup.

Matthew_Tan · August 5, 2020, 11:13pm

Ah, I believe I found the issue. Needed to export LD_LIBRARY_PATH. Maybe usr/lib isn’t the normal path for CUDA? I see lots of references saying it should be usr/local or something, but updating the LD_LIBRARY_PATH seems to have solved those errors.

lissyx · August 5, 2020, 11:17pm

Again, it depends on your distro, how you installed CUDA, etc. Maybe stale ldconfig cache?

Matthew_Tan · August 5, 2020, 11:19pm

Yes, maybe. I’ve put the expot LD_LIBRARY_PATH command in the activate file of the virtual environment for now. Thanks so much for your help!

Topic		Replies	Views
Cannot start fine-tuning with DeepSpeech 0.6.1 DeepSpeech	11	1264	September 28, 2020
Finetuning the model on gpu machine #CudnnRNNCanonicalToParams DeepSpeech	3	531	September 12, 2020
Fine tuning Deepspeech 0.9.1 with same alphabet DeepSpeech learning	40	1501	December 4, 2020
Error while fine tuning DeepSpeech learning , issue	2	446	November 25, 2020
Right CUDA version for using deepspeech-gpu DeepSpeech	12	3776	June 27, 2019

CUDA cuDNN Version Help

Related topics