Getting errors while trying to load checkpoint v0.7.0 for transfer learning

Vibhav_Anand · April 28, 2020, 7:25pm

Hi,
I was trying to use the transfer learning in the newly released version v0.7.0 .
I followed the steps mentioned in the document but I am getting the following error.
Please help.

I Loading best validating checkpoint from /content/deepspeech-0.7.0-checkpoint/best_dev-732522
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
Traceback (most recent call last):
  File "/content/DeepSpeech/DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 939, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 911, in main
    train()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 511, in train
    load_or_init_graph_for_training(session)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 97, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 70, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 915, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam not found in checkpoint

lissyx · April 28, 2020, 8:04pm

You can’t expect people to help you if you don’t share more informations like the command line you used …

lissyx · April 28, 2020, 8:05pm

You could have searched a little bit more in the doc: https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html#fine-tuning-same-alphabet

Copy/pasting your error on readthedocs reveals your exact problem …

Vibhav_Anand · April 28, 2020, 8:23pm

I am sorry. The exact issue is mentioned in the docs.

After i updated my code, i am getting another issue.

here is the code I used,

python3 '/content/DeepSpeech/DeepSpeech.py' \
    --n_hidden 2048 \
    --train_files   '/content/drive/My Drive/ytd_project/testing/excel files/train2.csv' \
    --dev_files   '/content/drive/My Drive/ytd_project/testing/excel files/dev2.csv' \
    --test_files  '/content/drive/My Drive/ytd_project/testing/excel files/test2.csv' \
    --checkpoint_dir '/content/drive/My Drive/transfer_learning/deepspeech-0.7.0-checkpoint/' \
    --load_cudnn \
    --alphabet_config_path '/content/drive/My Drive/transfer_learning/alphabet_training.txt'

The error received is:

I Loading best validating checkpoint from /content/drive/My Drive/transfer_learning/deepspeech-0.7.0-checkpoint/best_dev-732522
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
W CUDNN variable not found: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
Traceback (most recent call last):
  File "/content/DeepSpeech/DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 939, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 911, in main
    train()
  File "/content/DeepSpeech/training/deepspeech_training/train.py", line 511, in train
    load_or_init_graph_for_training(session)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 132, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 97, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers)
  File "/content/DeepSpeech/training/deepspeech_training/util/checkpoints.py", line 70, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1156, in _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (29,) for Tensor 'layer_6/bias/Initializer/zeros:0', which has shape '(77,)'

Am i still committing some mistake in the code?

lissyx · April 28, 2020, 8:25pm

Yes. You are changing the alphabet, it is also explicitely documented …

Vibhav_Anand · April 28, 2020, 8:28pm

But, for transfer learning I will have to pass a new alphabets.txt file, right?
Those alphabets that are contained in my audio set.

Vibhav_Anand · April 28, 2020, 8:31pm

Hi, the training has started now. I missed the --drop_source_layers in the code. Thanks a lot.

lissyx · April 28, 2020, 8:32pm

Yeah, the very next section of the link I gave you of the docs, explicitely named “transfer learning”.

Vibhav_Anand · April 29, 2020, 5:59am

Hi,
My code works fine when I use a single --checkpoint_dir

but throws up errors when I use separate --load_checkpoint_dir and --save_checkpoint_dir

here is the code in which issue occurs

*!python3 ‘/content/DeepSpeech/DeepSpeech.py’ *

–n_hidden 2048 *
–drop_source_layers 1 *
–load_checkpoint_dir ‘/content/drive/My Drive/transfer_learning/deepspeech-0.7.0-checkpoint/’ *
–save_checkpoint_dir ‘/content/drive/My Drive/transfer_learning/saved_checkpoint’ *
–alphabet_config_path ‘/content/drive/My Drive/transfer_learning/alphabet_training.txt’*
–train_files ‘/content/drive/My Drive/ytd_project/testing/excel files/train2.csv’ *
–dev_files ‘/content/drive/My Drive/ytd_project/testing/excel files/dev2.csv’ *
–test_files ‘/content/drive/My Drive/ytd_project/testing/excel files/test2.csv’ *
–load_cudnn *

here is the error code:

/content/DeepSpeech
Traceback (most recent call last):

File “/content/DeepSpeech/DeepSpeech.py”, line 12, in *
ds_train.run_script()*
File “/content/DeepSpeech/training/deepspeech_training/train.py”, line 939, in run_script*
absl.app.run(main)*
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 299, in run*
_run_main(main, args)*
File “/usr/local/lib/python3.6/dist-packages/absl/app.py”, line 250, in _run_main*
sys.exit(main(argv))*
File “/content/DeepSpeech/training/deepspeech_training/train.py”, line 906, in main*
early_training_checks()*
File “/content/DeepSpeech/training/deepspeech_training/train.py”, line 895, in early_training_checks*
log_warn('WARNING: You specified different values for --load_checkpoint_dir '*
NameError: name ‘log_warn’ is not defined

Please help me understand where am I going wrong.

lissyx · April 29, 2020, 5:17pm

Yes, this is a bug that you found and is now fixed: https://github.com/mozilla/DeepSpeech/pull/2959

Topic		Replies	Views
Errors when I try to use a pre-trained model with checkpoints DeepSpeech	7	3532	November 14, 2020
I got stuck while running training DeepSpeech learning , issue , dataset	1	880	July 12, 2021
Error on loading 0.5.1 checkpoints with current master DeepSpeech codebase DeepSpeech	2	858	August 9, 2019
Get error when loading the checkpoint from pre trained model DeepSpeech	1	540	July 5, 2021
Fatal Python error while transfer learning DeepSpeech learning , issue	6	1244	August 26, 2020

Getting errors while trying to load checkpoint v0.7.0 for transfer learning

Related topics