Training DeepSpeech on gpu failed

I am trying to train deepspeech model by following steps in the train your own model documentation besides the play-book for deepspeech, besides reading the issue of the reports on Github for my problem.
I used the following environment:
Nvidia RTX 2070 with 8 GB dedicated memory
Ubuntu 18.04
Cuda 10.0 /Cudnn 7.6.5
Tensorflow-gpu 1.15.4

nvidia-smi

±----------------------------------------------------------------------------+
| NVIDIA-SMI 470.42.01 Driver Version: 470.42.01 CUDA Version: 11.4 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce … On | 00000000:01:00.0 Off | N/A |
| N/A 48C P8 7W / N/A | 386MiB / 7982MiB | 2% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1224 G /usr/lib/xorg/Xorg 27MiB |
| 0 N/A N/A 1334 G /usr/bin/gnome-shell 69MiB |
| 0 N/A N/A 1551 G /usr/lib/xorg/Xorg 173MiB |
| 0 N/A N/A 1679 G /usr/bin/gnome-shell 29MiB |
| 0 N/A N/A 2034 G /usr/lib/firefox/firefox 12MiB |
| 0 N/A N/A 2656 G …AAAAAAAAA= --shared-files 70MiB |
±----------------------------------------------------------------------------+

nvcc -V
nvcc: NVIDIA ® Cuda compiler driver
Copyright © 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

The training is going correctly with CPU but when I add the flag

–train_cudnn True

the following error raise:

I Enabling automatic mixed precision training.
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
Traceback (most recent call last):
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1348, in _run_fn
self._extend_graph()
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1388, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by {{node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams}}with these attrs: [dropout=0, seed=4568, num_params=8, input_mode=“linear_input”, T=DT_FLOAT, direction=“unidirectional”, rnn_mode=“lstm”, seed2=257]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/home/seham/DeepSpeech/training/deepspeech_training/train.py”, line 982, in run_script
absl.app.run(main)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 312, in run
_run_main(main, args)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 258, in _run_main
sys.exit(main(argv))
File “/home/seham/DeepSpeech/training/deepspeech_training/train.py”, line 954, in main
train()
File “/home/seham/DeepSpeech/training/deepspeech_training/train.py”, line 529, in train
load_or_init_graph_for_training(session)
File “/home/seham/DeepSpeech/training/deepspeech_training/util/checkpoints.py”, line 137, in load_or_init_graph_for_training
_load_or_init_impl(session, methods, allow_drop_layers=True)
File “/home/seham/DeepSpeech/training/deepspeech_training/util/checkpoints.py”, line 112, in _load_or_init_impl
return _initialize_all_variables(session)
File “/home/seham/DeepSpeech/training/deepspeech_training/util/checkpoints.py”, line 88, in _initialize_all_variables
session.run(v.initializer)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /home/seham/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [dropout=0, seed=4568, num_params=8, input_mode=“linear_input”, T=DT_FLOAT, direction=“unidirectional”, rnn_mode=“lstm”, seed2=257]
Registered devices: [CPU, XLA_CPU]
Registered kernels:

 [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]

I have seen many people solved this error by repairing the environment settings, but I have tried this many times with Cuda 10.0 and cuda10.1 even with cuda 11.0. (on the other hand, the same issue occure with the docker image that has been explained in the play-book version)

pip list

Package Version Location


absl-py 0.13.0
alembic 1.6.5
appdirs 1.4.4
astor 0.8.1
attrdict 2.0.1
attrs 21.2.0
audioread 2.1.9
beautifulsoup4 4.9.3
bs4 0.0.1
cached-property 1.5.2
certifi 2021.5.30
cffi 1.14.6
charset-normalizer 2.0.3
cliff 3.8.0
cmaes 0.8.2
cmd2 2.1.2
colorama 0.4.4
colorlog 5.0.1
dataclasses 0.8
decorator 5.0.9
deepspeech-training 0.9.3 /home/seham/DeepSpeech/training
ds-ctcdecoder 0.9.3
gast 0.2.2
google-pasta 0.2.0
greenlet 1.1.0
grpcio 1.38.1
h5py 3.1.0
idna 3.2
importlib-metadata 4.6.1
joblib 1.0.1
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
librosa 0.8.1
llvmlite 0.31.0
Mako 1.1.4
Markdown 3.3.4
MarkupSafe 2.0.1
numba 0.47.0
numpy 1.18.5
opt-einsum 3.3.0
optuna 2.8.0
opuslib 2.0.0
packaging 21.0
pandas 1.1.5
pbr 5.6.0
pip 21.1.3
pkg-resources 0.0.0
pooch 1.4.0
prettytable 2.1.0
progressbar2 3.53.1
protobuf 3.17.3
pycparser 2.20
pyparsing 2.4.7
pyperclip 1.8.2
python-dateutil 2.8.2
python-editor 1.0.4
python-utils 2.5.6
pytz 2021.1
pyxdg 0.27
PyYAML 5.4.1
requests 2.26.0
resampy 0.2.2
scikit-learn 0.24.2
scipy 1.5.4
semver 2.13.0
setuptools 49.6.0
six 1.16.0
SoundFile 0.10.3.post1
soupsieve 2.2.1
sox 1.4.1
SQLAlchemy 1.4.21
stevedore 3.3.0
tensorboard 1.15.0
tensorflow 1.15.4
tensorflow-estimator 1.15.1
tensorflow-gpu 1.15.4
termcolor 1.1.0
threadpoolctl 2.2.0
tqdm 4.61.2
typing-extensions 3.10.0.0
urllib3 1.26.6
wcwidth 0.2.5
Werkzeug 2.0.1
wheel 0.34.2
wrapt 1.12.1
zipp 3.5.0

Any help is much appreciated.

Ask the coqui guys as they do a lot of the training currently. Read more in this post.

Is there anything that I should edit in the configuration code ?

This is seriously unbelievable because cudnn and cuda have been tested and they are working fine!