Errors when I try to use a pre-trained model with checkpoints

Aleksei_Smoliarchuk · February 20, 2020, 5:54am

Trying to run on TF 1.14.0, DeepSpeech 0.6.1 package:

!python -u DeepSpeech.py --n_hidden 2048
–checkpoint_dir deepspeech-0.6.1-checkpoint --epochs -3
–train_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_train.csv
–dev_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_dev.csv
–test_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_test.csv–learning_rate 0.00001
–export_dir export_model
–summary_dir=/root/data/deepspeech/tensorboard \

Produces this error

I Loading best validating checkpoint from deepspeech-0.6.1-checkpoint/best_dev-233784
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
File “DeepSpeech.py”, line 900, in
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 873, in main
train()
File “DeepSpeech.py”, line 505, in train
load_or_init_graph(session, method_order)
File “/content/DeepSpeech/util/checkpoints.py”, line 103, in load_or_init_graph
return _load_checkpoint(session, ckpt_path)
File “/content/DeepSpeech/util/checkpoints.py”, line 70, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 678, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint

How to fix this?

reuben · February 20, 2020, 6:26am

These messages and files in the error message have only been added recently to master. So it doesn’t look like you’re actually using the v0.6.1 version of the code.

Aleksei_Smoliarchuk · February 20, 2020, 6:32am

Isn’t it the wrong way to get 0.6.1 version of the code?

!pip3 install deepspeech-gpu --upgrade

#installing dependencies

#cloning environment

!git clone GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

%cd /content/DeepSpeech

%pip install -r requirements.txt

Aleksei_Smoliarchuk · February 20, 2020, 6:51am

Okay, I fixed that by

!git clone GitHub - mozilla/DeepSpeech: DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers. --branch v0.6.1

Now I get this

Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:119) with these attrs: [rnn_mode=“lstm”, seed2=240, seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode=“linear_input”, direction=“unidirectional”]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.1-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.1-checkpoint/best_dev-233784.

reuben · February 20, 2020, 6:59am

The release models are trained with CuDNN RNN. You need to enable it with --use_cudnn_rnn. If you want to continue training on a CPU you can instead use --cudnn_checkpoint. See the documentation on the flag: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/util/flags.py#L88

Aleksei_Smoliarchuk · February 20, 2020, 7:01am

I have done it, also tried writing just -use_cudnn_rnn .

!python -u DeepSpeech.py --n_hidden 2048
–checkpoint_dir deepspeech-0.6.1-checkpoint --epochs 3
–train_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_train.csv
–dev_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_dev.csv
–test_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_test.csv–learning_rate 0.00001
–export_dir export_model
–summary_dir=/root/data/deepspeech/tensorboard
–use_cudnn_rnn True

but my mistake is the same

E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:119) with these attrs: [dropout=0, seed=4568, num_params=8, input_mode=“linear_input”, T=DT_FLOAT, direction=“unidirectional”, rnn_mode=“lstm”, seed2=240]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.1-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.1-checkpoint/best_dev-233784.

reuben · February 20, 2020, 7:44am

Then it probably means your CUDA/CuDNN setup is broken. Check that you have installed all the dependencies and that tensorflow-gpu is working properly and using the GPUs.

masoud_parpanchi · November 14, 2020, 9:50am

If you face this problem in newer versions you should use this flag :

--train_cudnn True

Topic		Replies	Views
Error using checkpoint 0.6.1 while training on own data DeepSpeech	6	1934	April 8, 2020
Error with Deepspeech 0.6.1: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint [[{{node save/RestoreV2}}]] DeepSpeech	3	2295	April 8, 2020
Error on loading 0.5.1 checkpoints with current master DeepSpeech codebase DeepSpeech	2	860	August 9, 2019
Get error when loading the checkpoint from pre trained model DeepSpeech	1	554	July 5, 2021
Finetuning with Common Voice - v0.6.0 DeepSpeech	1	645	January 8, 2020

Errors when I try to use a pre-trained model with checkpoints

Related topics