Finetuning the model on gpu machine

buxbaum · September 11, 2020, 7:10pm

Hello,
I encountered some issues while finetuning on gpu (Cuda 10.1, CuDNN 7.6).
I want to finetune the model using checkpoints from version 0.8.1, and my custom data. The finetuning is working on cpu but takes many days, so I decided to use GPU. After typing:
python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir german_checkpoints/ --epochs 3 --train_files finetuning_data/synthetic_data/my-train.csv --dev_files finetuning_data/synthetic_data/my-dev.csv --test_files finetuning_data/synthetic_data/my_dev.csv --learning_rate 0.0001 --use_allow_growth true --train_cudnn true --test_batch_size=128 --train_batch_size=128 --dev_batch_size=128

I’m getting the following error:

Traceback (most recent call last):

File “DeepSpeech.py”, line 12, in *
ds_train.run_script()*
File “/home/ubuntu/Desktop/DeepSpeech/training/deepspeech_training/train.py”, line 961, in run_script*
absl.app.run(main)*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/absl/app.py”, line 300, in run*
_run_main(main, args)*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/absl/app.py”, line 251, in _run_main*
sys.exit(main(argv))*
File “/home/ubuntu/Desktop/DeepSpeech/training/deepspeech_training/train.py”, line 933, in main*
train()*
File “/home/ubuntu/Desktop/DeepSpeech/training/deepspeech_training/train.py”, line 523, in train*
load_or_init_graph_for_training(session)*
File “/home/ubuntu/Desktop/DeepSpeech/training/deepspeech_training/util/checkpoints.py”, line 132, in load_or_init_graph_for_training*
_load_or_init_impl(session, methods, allow_drop_layers=True)*
File “/home/ubuntu/Desktop/DeepSpeech/training/deepspeech_training/util/checkpoints.py”, line 97, in _load_or_init_impl*
return _load_checkpoint(session, ckpt_path, allow_drop_layers)*
File “/home/ubuntu/Desktop/DeepSpeech/training/deepspeech_training/util/checkpoints.py”, line 70, in _load_checkpoint*
v.load(ckpt.get_tensor(v.op.name), session=session)*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py”, line 324, in new_func*
return func(*args, *kwargs)
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/ops/variables.py”, line 1006, in load*
session.run(self.initializer, {self.initializer.inputs[1]: value})*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 950, in run*
run_metadata_ptr)*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1173, in _run*
feed_dict_tensor, options, run_metadata)*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1350, in _do_run*
run_metadata)*
File “/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1370, in _do_call*
raise type(e)(node_def, op, message)*
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at /home/ubuntu/finetuning1/home//Desktop/DeepSpeech/training/deepspeech_training/train.py:128) with these attrs: [input_mode=“linear_input”, T=DT_FLOAT, direction=“unidirectional”, rnn_mode=“lstm”, seed2=247, seed=4568, dropout=0, num_params=8]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device=‘GPU’; T in [DT_DOUBLE]*
device=‘GPU’; T in [DT_FLOAT]*
device=‘GPU’; T in [DT_HALF]*

    [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]*

Here the pip list in my env:

absl-py 0.10.0
alabaster 0.7.12
alembic 1.4.3
anaconda-client 1.7.2
anaconda-project 0.8.3
appdirs 1.4.4
argh 0.26.2
asn1crypto 1.3.0
astor 0.8.1
astroid 2.4.2
astropy 4.0.1.post1
astunparse 1.6.3
atomicwrites 1.3.0
attrdict 2.0.1
attrs 20.2.0
audioread 2.1.8
autopep8 1.4.4
autovizwidget 0.15.0
Babel 2.8.0
backcall 0.1.0
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4 4.9.1
bitarray 1.2.1
bkcharts 0.2
bleach 3.1.4
bokeh 2.0.1
boto 2.49.0
boto3 1.14.37
botocore 1.17.37
Bottleneck 1.3.2
bs4 0.0.1
cachetools 4.1.1
certifi 2020.6.20
cffi 1.14.2
chardet 3.0.4
click 7.1.1
cliff 3.4.0
cloudpickle 1.3.0
clyent 1.2.2
cmaes 0.6.1
cmd2 1.3.9
colorama 0.4.3
colorlog 4.2.1
contextlib2 0.6.0.post1
cryptography 2.8
cycler 0.10.0
Cython 0.29.15
cytoolz 0.10.1
dask 2.14.0
decorator 4.4.2
deepspeech-gpu 0.8.2
*deepspeech-training 0.9.0a3 *

                        /home/ubuntu/path.../Desktop/DeepSpeech/training*

defusedxml 0.6.0
diff-match-patch 20181111
distributed 2.14.0
docutils 0.15.2
ds-ctcdecoder 0.9.0a3
entrypoints 0.3
environment-kernels 1.1.1
et-xmlfile 1.0.1
fastcache 1.1.0
filelock 3.0.12
flake8 3.7.9
Flask 1.1.1
fsspec 0.7.1
future 0.18.2
gast 0.3.3
gevent 1.4.0
glob2 0.7
gmpy2 2.0.8
google-auth 1.20.1
google-auth-oauthlib 0.4.1
google-pasta 0.2.0
greenlet 0.4.15
grpcio 1.32.0
h5py 2.10.0
hdijupyterutils 0.15.0
HeapDict 1.0.1
horovod 0.19.5
html5lib 1.0.1
hypothesis 5.8.3
idna 2.10
imageio 2.8.0
imagesize 1.2.0
importlib-metadata 1.7.0
intervaltree 3.0.2
ipykernel 5.1.4
ipyparallel 6.2.4
ipython 7.13.0
ipython-genutils 0.2.0
ipywidgets 7.5.1
isort 4.3.21
itsdangerous 1.1.0
jdcal 1.4.1
jedi 0.15.2
jeepney 0.4.3
Jinja2 2.11.1
jmespath 0.9.4
joblib 0.16.0
json5 0.9.4
jsonschema 3.2.0
jupyter 1.0.0
jupyter-client 6.1.2
jupyter-console 6.1.0
jupyter-core 4.6.3
jupyterlab 1.2.6
jupyterlab-server 1.1.0
Keras 2.3.0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
keyring 21.1.1
kiwisolver 1.1.0
lazy-object-proxy 1.4.3
libarchive-c 2.8
librosa 0.8.0
lief 0.9.0
llvmlite 0.31.0
locket 0.2.0
lxml 4.5.0
Mako 1.1.3
Markdown 3.2.2
MarkupSafe 1.1.1
matplotlib 3.1.3
mccabe 0.6.1
mistune 0.8.4
mkl-fft 1.0.15
mkl-random 1.1.0
mkl-service 2.3.0
mock 4.0.1
more-itertools 8.2.0
mpmath 1.1.0
msgpack 1.0.0
multipledispatch 0.6.0
nb-conda 2.2.1
nb-conda-kernels 2.2.3
nbconvert 5.6.1
nbformat 5.0.4
networkx 2.4
nltk 3.4.5
nose 1.3.7
notebook 6.0.3
numba 0.47.0
numexpr 2.7.1
numpy 1.19.2
numpydoc 0.9.2
oauthlib 3.1.0
olefile 0.46
opencv-python 4.2.0.32
openpyxl 3.0.3
opt-einsum 3.3.0
optuna 2.1.0
opuslib 2.0.0
packaging 20.4
pandas 1.1.2
pandocfilters 1.4.2
parso 0.5.2
partd 1.1.0
path 13.1.0
pathlib2 2.3.5
pathtools 0.1.2
patsy 0.5.1
pbr 5.5.0
pep8 1.7.1
pexpect 4.8.0
pickleshare 0.7.5
Pillow 7.1.2
pip 20.0.2
pkginfo 1.5.0.1
plotly 4.9.0
pluggy 0.13.1
ply 3.11
pooch 1.2.0
prettytable 0.7.2
progressbar2 3.53.1
prometheus-client 0.7.1
prompt-toolkit 3.0.4
protobuf 3.13.0
protobuf3-to-dict 0.1.5
psutil 5.7.0
psycopg2 2.7.5
ptyprocess 0.6.0
py 1.8.1
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.5.0
pycosat 0.6.3
pycparser 2.20
pycrypto 2.6.1
pycurl 7.43.0.5
pydocstyle 4.0.1
pyflakes 2.1.1
pygal 2.4.0
Pygments 2.6.1
pykerberos 1.2.1
pylint 2.5.3
pyodbc 4.0.0-unsupported
pyOpenSSL 19.1.0
pyparsing 2.4.7
pyperclip 1.8.0
pyrsistent 0.16.0
PySocks 1.7.1
pytest 5.4.1
pytest-arraydiff 0.3
pytest-astropy 0.8.0
pytest-astropy-header 0.1.2
pytest-doctestplus 0.5.0
pytest-openfiles 0.4.0
pytest-remotedata 0.3.2
python-dateutil 2.8.1
python-editor 1.0.4
python-jsonrpc-server 0.3.4
python-language-server 0.31.9
python-utils 2.4.0
pytz 2020.1
PyWavelets 1.1.1
pyxdg 0.26
PyYAML 5.3.1
pyzmq 18.1.1
QDarkStyle 2.8
QtAwesome 0.7.0
qtconsole 4.7.2
QtPy 1.9.0
requests 2.24.0
requests-kerberos 0.12.0
requests-oauthlib 1.3.0
resampy 0.2.2
retrying 1.3.3
rope 0.16.0
rsa 4.6
Rtree 0.9.4
ruamel-yaml 0.15.87
s3fs 0.4.0
s3transfer 0.3.3
sagemaker 1.72.0
scikit-image 0.16.2
scikit-learn 0.23.2
scipy 1.4.1
seaborn 0.10.0
SecretStorage 3.1.2
semver 2.10.2
Send2Trash 1.5.0
setuptools 50.3.0
simplegeneric 0.8.1
singledispatch 3.4.0.3
six 1.15.0
smdebug-rulesconfig 0.1.4
snowballstemmer 2.0.0
sortedcollections 1.1.2
sortedcontainers 2.1.0
SoundFile 0.10.3.post1
soupsieve 2.0.1
sox 1.4.0
sparkmagic 0.15.0
Sphinx 3.0.4
sphinxcontrib-applehelp 1.0.2
sphinxcontrib-devhelp 1.0.2
sphinxcontrib-htmlhelp 1.0.3
sphinxcontrib-jsmath 1.0.1
sphinxcontrib-qthelp 1.0.3
sphinxcontrib-serializinghtml 1.1.4
sphinxcontrib-websupport 1.2.1
spyder 4.1.2
spyder-kernels 1.9.0
SQLAlchemy 1.3.19
statsmodels 0.11.0
stevedore 3.2.1
sympy 1.5.1
tables 3.6.1
tblib 1.6.0
tensorboard 1.14.0
tensorboard-plugin-wit 1.7.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
tensorflow-serving-api 2.1.0
termcolor 1.1.0
terminado 0.8.3
testpath 0.4.4
threadpoolctl 2.1.0
toml 0.10.1
toolz 0.10.0
tornado 6.0.4
tqdm 4.48.2
traitlets 4.3.3
typed-ast 1.4.1
typing-extensions 3.7.4.1
ujson 1.35
unicodecsv 0.14.1
urllib3 1.25.10
watchdog 0.10.2
wcwidth 0.2.5
webencodings 0.5.1
Werkzeug 1.0.1
wheel 0.35.1
widgetsnbextension 3.5.1
wrapt 1.12.1
wurlitzer 2.0.0
xlrd 1.2.0
XlsxWriter 1.2.8
xlwt 1.3.0
yapf 0.28.0
zict 2.0.0
zipp 3.1.0

I tried also with tensorflow-gpu==1.15.2, but got the same error.

Could someone give me some hint ?
Thanks in advance

baconator · September 11, 2020, 6:57pm

First I’d try asking over in the Deepspeech forum…

buxbaum · September 11, 2020, 7:23pm

oh, that is right !