Get error when loading the checkpoint from pre trained model

Xuetong_Sun · July 5, 2021, 5:14pm

I tried to do transfer learning with the pre trained model v0.9.3.
OS system: Ubuntu 20.04
Graphic card: RTX3090
Run in Nvidia docker container tensorflow:20.11-tf1-py3
command and flags:
python3 DeepSpeech.py
–train_files /workspace/de/clips/train.csv
–test_files /workspace/de/clips/test.csv
–dev_files /workspace/de/clips/dev.csv
–train_batch_size=128
–dev_batch_size=128
–test_batch_size=128
–epochs=200
–dropout_rate 0.4
–learning_rate 0.0001
–load_checkpoint_dir /workspace/DeepSpeech/checkpoint/othersmade/deepspeech-0.9.3-checkpoint
–save_checkpoint_dir /workspace/DeepSpeech/checkpoint/selfmade/checkpoint_705
–drop_source_layers 1
–export_dir /workspace/DeepSpeech/data/model
–export_file_name output_705
–export_author_id sun
–export_model_name 705
–export_model_version 1
–summary_dir /workspace/DeepSpeech/data/summaries
–n_hidden 2048
–early_stop True
–es_epochs 10
–es_min_delta 0.05
–reduce_lr_on_plateau True
–plateau_epochs 5
–plateau_reduction 0.1
–alphabet_config_path /workspace/DeepSpeech/data/alphabet.txt
–scorer_path /workspace/DeepSpeech/data/kenlm_my.scorer \

I freeze the last output layer as the document said. And I get the checkpoint folder from the Mozilla DeepSpeech releases in Github.

the error logs is:
I Loading best validating checkpoint from /workspace/DeepSpeech/checkpoint/deepspeech-0.9.3-checkpoint/best_dev-1466475
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/workspace/DeepSpeech/deepspeech_training/train.py”, line 982, in run_script
absl.app.run(main)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 303, in run
_run_main(main, args)
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 251, in _run_main
sys.exit(main(argv))
File “/workspace/DeepSpeech/deepspeech_training/train.py”, line 954, in main
train()
File “/workspace/DeepSpeech/deepspeech_training/train.py”, line 529, in train
load_or_init_graph_for_training(session)
File “/workspace/DeepSpeech/deepspeech_training/util/checkpoints.py”, line 137, in load_or_init_graph_for_training
_load_or_init_impl(session, methods, allow_drop_layers=True)
File “/workspace/DeepSpeech/deepspeech_training/util/checkpoints.py”, line 98, in _load_or_init_impl
return _load_checkpoint(session, ckpt_path, allow_drop_layers, allow_lr_init=allow_lr_init)
File “/workspace/DeepSpeech/deepspeech_training/util/checkpoints.py”, line 71, in _load_checkpoint
v.load(ckpt.get_tensor(v.op.name), session=session)
File “/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py”, line 915, in get_tensor
return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint

How can I fix this Error.

Thx a lot for helping.

lissyx · July 5, 2021, 6:04pm

You miss --train_cudnn: https://deepspeech.readthedocs.io/en/r0.9/Flags.html?highlight=train_cudnn#flags