Error occur when tried to run ./bin/run-ldc93s1.sh

hoangdungpham0703 · July 23, 2019, 8:16am

Hi!

When I tried to run ./bin/run-ldc93s1.sh, I got this error message. I suppose this error occurs because of some tensorflow configuration, but I’m still not sure where to fix. Could anyone help me?

Thanks in advance.

+ '[' '!' -f DeepSpeech.py ']'
+ '[' '!' -f data/ldc93s1/ldc93s1.csv ']'
+ '[' -d '' ']'
++ python -c 'from xdg import BaseDirectory as xdg; print(xdg.save_data_path("deepspeech/ldc93s1"))'
+ checkpoint_dir=/Users/dylan/.local/share/deepspeech/ldc93s1
+ export CUDA_VISIBLE_DEVICES=0
+ CUDA_VISIBLE_DEVICES=0
+ python -u DeepSpeech.py --noshow_progressbar --train_files data/ldc93s1/ldc93s1.csv --test_files data/ldc93s1/ldc93s1.csv --train_batch_size 1 --test_batch_size 1 --n_hidden 100 --epochs 200 --checkpoint_dir /Users/dylan/.local/share/deepspeech/ldc93s1
W0723 15:06:40.576691 140735759917952 deprecation.py:323] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.

W0723 15:06:40.834651 140735759917952 deprecation.py:323] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:348: Iterator.output_types (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_types(iterator)`.
W0723 15:06:40.835153 140735759917952 deprecation.py:323] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:349: Iterator.output_shapes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(iterator)`.
W0723 15:06:40.835698 140735759917952 deprecation.py:323] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py:351: Iterator.output_classes (from tensorflow.python.data.ops.iterator_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_classes(iterator)`.
W0723 15:06:41.128334 140735759917952 deprecation.py:506] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0723 15:06:43.652518 140735759917952 deprecation.py:323] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0723 15:06:44.470373 140735759917952 deprecation.py:323] From /Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I0723 15:06:44.472190 140735759917952 saver.py:1280] Restoring parameters from /Users/dylan/.local/share/deepspeech/ldc93s1/train-600
Traceback (most recent call last):
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias not found in checkpoint
	 [[{{node save/RestoreV2}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1286, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias not found in checkpoint
	 [[node save/RestoreV2 (defined at DeepSpeech.py:457) ]]

Original stack trace for 'save/RestoreV2':
  File "DeepSpeech.py", line 844, in <module>
    tfv1.app.run(main)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/dylan/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/Users/dylan/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 828, in main
    train()
  File "DeepSpeech.py", line 457, in train
    checkpoint_saver = tfv1.train.Saver(max_to_keep=FLAGS.max_to_keep)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1296, in restore
    names_to_keys = object_graph_key_mapping(save_path)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1614, in object_graph_key_mapping
    object_graph_string = reader.get_tensor(trackable.OBJECT_GRAPH_PROTO_KEY)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 678, in get_tensor
    return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 844, in <module>
    tfv1.app.run(main)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/dylan/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/Users/dylan/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 828, in main
    train()
  File "DeepSpeech.py", line 475, in train
    loaded = try_loading(session, checkpoint_saver, checkpoint_filename, 'most recent')
  File "DeepSpeech.py", line 392, in try_loading
    saver.restore(session, checkpoint_path)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1302, in restore
    err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias not found in checkpoint
	 [[node save/RestoreV2 (defined at DeepSpeech.py:457) ]]

Original stack trace for 'save/RestoreV2':
  File "DeepSpeech.py", line 844, in <module>
    tfv1.app.run(main)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/Users/dylan/.local/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/Users/dylan/.local/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "DeepSpeech.py", line 828, in main
    train()
  File "DeepSpeech.py", line 457, in train
    checkpoint_saver = tfv1.train.Saver(max_to_keep=FLAGS.max_to_keep)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 825, in __init__
    self.build()
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 837, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 875, in _build
    build_restore=build_restore)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 508, in _build_internal
    restore_sequentially, reshape)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 328, in _AddRestoreOps
    restore_sequentially)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 575, in bulk_restore
    return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1696, in restore_v2
    name=name)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "/Users/dylan/miniconda3/envs/deepspeech/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

lissyx · July 23, 2019, 8:56am

Read the error message, it contains the explanation:

Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint.

You are trying to re-train LDC93S1 on uptodate master with a stale checkpoint.

hoangdungpham0703 · July 23, 2019, 9:19am

Thank you for your answer.
I found the stale checkpoint and after deleted it, the script works totally fine.

lissyx · July 23, 2019, 9:20am

That being said, please note you won’t be able to do any serious training on macOS, since there’s no CUDA support on that platform for TensorFlow.