Errors when I try to use a pre-trained model with checkpoints

Trying to run on TF 1.14.0, DeepSpeech 0.6.1 package:

!python -u DeepSpeech.py --n_hidden 2048
–checkpoint_dir deepspeech-0.6.1-checkpoint --epochs -3
–train_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_train.csv
–dev_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_dev.csv
–test_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_test.csv–learning_rate 0.00001
–export_dir export_model
–summary_dir=/root/data/deepspeech/tensorboard \

Produces this error

I Loading best validating checkpoint from deepspeech-0.6.1-checkpoint/best_dev-233784
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
File “DeepSpeech.py”, line 900, in
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 873, in main
train()
File “DeepSpeech.py”, line 505, in train
load_or_init_graph(session, method_order)
File “/content/DeepSpeech/util/checkpoints.py”, line 103, in load_or_init_graph
return _load_checkpoint(session, ckpt_path)
File “/content/DeepSpeech/util/checkpoints.py”, line 70, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 678, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint

How to fix this?

These messages and files in the error message have only been added recently to master. So it doesn’t look like you’re actually using the v0.6.1 version of the code.

Isn’t it the wrong way to get 0.6.1 version of the code?

!pip3 install deepspeech-gpu --upgrade

#installing dependencies

#cloning environment

!git clone https://github.com/mozilla/DeepSpeech

%cd /content/DeepSpeech

%pip install -r requirements.txt

Okay, I fixed that by

!git clone https://github.com/mozilla/DeepSpeech --branch v0.6.1

Now I get this

Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:119) with these attrs: [rnn_mode=“lstm”, seed2=240, seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode=“linear_input”, direction=“unidirectional”]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.1-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.1-checkpoint/best_dev-233784.

The release models are trained with CuDNN RNN. You need to enable it with --use_cudnn_rnn. If you want to continue training on a CPU you can instead use --cudnn_checkpoint. See the documentation on the flag: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/util/flags.py#L88

I have done it, also tried writing just -use_cudnn_rnn .

!python -u DeepSpeech.py --n_hidden 2048
–checkpoint_dir deepspeech-0.6.1-checkpoint --epochs 3
–train_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_train.csv
–dev_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_dev.csv
–test_files /content/drive/My\ Drive/Dataset/en_UK/en_UK_test.csv–learning_rate 0.00001
–export_dir export_model
–summary_dir=/root/data/deepspeech/tensorboard
–use_cudnn_rnn True

but my mistake is the same

E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:119) with these attrs: [dropout=0, seed=4568, num_params=8, input_mode=“linear_input”, T=DT_FLOAT, direction=“unidirectional”, rnn_mode=“lstm”, seed2=240]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.1-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.1-checkpoint/best_dev-233784.

Then it probably means your CUDA/CuDNN setup is broken. Check that you have installed all the dependencies and that tensorflow-gpu is working properly and using the GPUs.

If you face this problem in newer versions you should use this flag :

--train_cudnn True