I Loading best validating checkpoint from deepspeech-0.6.1-checkpoint/best_dev-233784
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
File “DeepSpeech.py”, line 900, in
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “DeepSpeech.py”, line 873, in main
train()
File “DeepSpeech.py”, line 505, in train
load_or_init_graph(session, method_order)
File “/content/DeepSpeech/util/checkpoints.py”, line 103, in load_or_init_graph
return _load_checkpoint(session, ckpt_path)
File “/content/DeepSpeech/util/checkpoints.py”, line 70, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File “/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 678, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint
These messages and files in the error message have only been added recently to master. So it doesn’t look like you’re actually using the v0.6.1 version of the code.
Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:119) with these attrs: [rnn_mode=“lstm”, seed2=240, seed=4568, dropout=0, num_params=8, T=DT_FLOAT, input_mode=“linear_input”, direction=“unidirectional”]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.1-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.1-checkpoint/best_dev-233784.
The release models are trained with CuDNN RNN. You need to enable it with --use_cudnn_rnn. If you want to continue training on a CPU you can instead use --cudnn_checkpoint. See the documentation on the flag: https://github.com/mozilla/DeepSpeech/blob/v0.6.1/util/flags.py#L88
E Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
E
E No OpKernel was registered to support Op ‘CudnnRNNCanonicalToParams’ used by node tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams (defined at DeepSpeech.py:119) with these attrs: [dropout=0, seed=4568, num_params=8, input_mode=“linear_input”, T=DT_FLOAT, direction=“unidirectional”, rnn_mode=“lstm”, seed2=240]
E Registered devices: [CPU, XLA_CPU]
E Registered kernels:
E
E
E [[tower_0/cudnn_lstm/cudnn_lstm/CudnnRNNCanonicalToParams]]
E The checkpoint in deepspeech-0.6.1-checkpoint/best_dev-233784 does not match the shapes of the model. Did you change alphabet.txt or the --n_hidden parameter between train runs using the same checkpoint dir? Try moving or removing the contents of deepspeech-0.6.1-checkpoint/best_dev-233784.
Then it probably means your CUDA/CuDNN setup is broken. Check that you have installed all the dependencies and that tensorflow-gpu is working properly and using the GPUs.