Getting errors while trying to load checkpoint v0.7.0 for transfer learning

Hi,
I was trying to use the transfer learning in the newly released version v0.7.0 .
I followed the steps mentioned in the document but I am getting the following error.
Please help.

I Loading best validating checkpoint from /content/deepspeech-0.7.0-checkpoint/best_dev-732522
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
  File "/content/DeepSpeech/DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 939, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 911, in main
    train()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 511, in train
    load_or_init_graph_for_training(session)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 97, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 70, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint

You can’t expect people to help you if you don’t share more informations like the command line you used …

You could have searched a little bit more in the doc: https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html#fine-tuning-same-alphabet

Copy/pasting your error on readthedocs reveals your exact problem …

I am sorry. The exact issue is mentioned in the docs.

After i updated my code, i am getting another issue.

here is the code I used,

python3 '/content/DeepSpeech/DeepSpeech.py' \
    --n_hidden 2048 \
    --train_files   '/content/drive/My Drive/ytd_project/testing/excel files/train2.csv' \
    --dev_files   '/content/drive/My Drive/ytd_project/testing/excel files/dev2.csv' \
    --test_files  '/content/drive/My Drive/ytd_project/testing/excel files/test2.csv' \
    --checkpoint_dir '/content/drive/My Drive/transfer_learning/deepspeech-0.7.0-checkpoint/' \
    --load_cudnn \
    --alphabet_config_path '/content/drive/My Drive/transfer_learning/alphabet_training.txt'

The error received is:

I Loading best validating checkpoint from /content/drive/My Drive/transfer_learning/deepspeech-0.7.0-checkpoint/best_dev-732522
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
Traceback (most recent call last):
  File "/content/DeepSpeech/DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 939, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 911, in main
    train()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 511, in train
    load_or_init_graph_for_training(session)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 97, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 70, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (29,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(77,)'

Am i still committing some mistake in the code?

Yes. You are changing the alphabet, it is also explicitely documented …

But, for transfer learning I will have to pass a new alphabets.txt file, right?
Those alphabets that are contained in my audio set.

Hi, the training has started now. I missed the --drop_source_layers in the code. Thanks a lot.

1 Like

Yeah, the very next section of the link I gave you of the docs, explicitely named “transfer learning”.

1 Like

Hi,
My code works fine when I use a single --checkpoint_dir

but throws up errors when I use separate --load_checkpoint_dir and --save_checkpoint_dir

here is the code in which issue occurs

*!python3 ‘/content/DeepSpeech/DeepSpeech.py’ *

  • –n_hidden 2048 *
  • –drop_source_layers 1 *
  • –load_checkpoint_dir ‘/content/drive/My Drive/transfer_learning/deepspeech-0.7.0-checkpoint/’ *
  • –save_checkpoint_dir ‘/content/drive/My Drive/transfer_learning/saved_checkpoint’ *
  • –alphabet_config_path ‘/content/drive/My Drive/transfer_learning/alphabet_training.txt’*
  • –train_files ‘/content/drive/My Drive/ytd_project/testing/excel files/train2.csv’ *
  • –dev_files ‘/content/drive/My Drive/ytd_project/testing/excel files/dev2.csv’ *
  • –test_files ‘/content/drive/My Drive/ytd_project/testing/excel files/test2.csv’ *
  • –load_cudnn *

here is the error code:

/content/DeepSpeech
Traceback (most recent call last):

  • File “/content/DeepSpeech/DeepSpeech.py”, line 12, in *
  • ds_train.run_script()*
  • File “/content/DeepSpeech/training/deepspeech_training/train.py”, line 939, in run_script*
  • absl.app.run(main)*
  • File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run*
  • _run_main(main, args)*
  • File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main*
  • sys.exit(main(argv))*
  • File “/content/DeepSpeech/training/deepspeech_training/train.py”, line 906, in main*
  • early_training_checks()*
  • File “/content/DeepSpeech/training/deepspeech_training/train.py”, line 895, in early_training_checks*
  • log_warn('WARNING: You specified different values for --load_checkpoint_dir '*
    NameError: name ‘log_warn’ is not defined

Please help me understand where am I going wrong.

Yes, this is a bug that you found and is now fixed: https://github.com/mozilla/DeepSpeech/pull/2959

1 Like