wave.Error: fmt chunk and/or data chunk missing

I get this error on one of the dev datasets that I have

python3 DeepSpeech.py
–alphabet_config_path data/alphabet.txt
–beam_width 32
–checkpoint_dir $ckpt_dir
–export_dir $ckpt_dir
–scorer $scorer_path
–n_hidden 128
–learning_rate 0.0001
–lm_alpha 0.75
–lm_beta 1.85
–train_batch_size 6
–dev_batch_size 6
–test_batch_size 6
–report_count 10
–epochs 500
–noearly_stop
–noshow_progressbar
–export_tflite
–train_files /datasets/deepspeech_wakeword_dataset/wakeword-train.csv,
/datasets/deepspeech_wakeword_dataset/wakeword-train-other-accents.csv,
/datasets/deepspeech_wakeword_dataset/wakeword-train.csv,
/datasets/india_portal_2may2019-train.csv,
/datasets/india_portal_2to9may2019-train.csv,
/datasets/india_portal_9to19may2019-train.csv,
/datasets/india_portal_19to24may2019-train.csv,
/datasets/brazil_portal_20to26june2019-wakeword-train.csv,
/datasets/brazil_portal_26juneto3july2019-wakeword-train.csv,
/datasets/japan_portal_3july2019-wakeword-train.csv,
/datasets/mixed_portal_backups_14_16_17_18_19_visteon_wakeword_dataset-train.csv,
/datasets/alexa-train.csv,
/datasets/alexa-polly-train.csv,
/datasets/alexa-sns.csv,
/datasets/india_portal_ww_data_04282020/custom_train.csv,
/datasets/india_portal_ww_data_05042020/custom_train.csv,
/datasets/india_portal_ww_data_05222020/custom_train.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_train.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_test.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_train.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_test.csv,
/datasets/ww_gtts_data_google_siri/custom_train.csv,
/datasets/ww_gtts_data_google_siri/custom_dev.csv,
/datasets/ww_polly_data_google_siri/custom_train.csv,
/datasets/ww_polly_data_google_siri/custom_test.csv,
/datasets/cv-corpus-5.1-2020-06-22/train.csv
–dev_files /datasets/deepspeech_wakeword_dataset/wakeword-dev.csv,
/datasets/india_portal_2may2019-dev.csv,
/datasets/india_portal_2to9may2019-dev.csv,
/datasets/india_portal_9to19may2019-dev.csv,
/datasets/india_portal_19to24may2019-dev.csv,
/datasets/brazil_portal_20to26june2019-wakeword-dev.csv,
/datasets/brazil_portal_26juneto3july2019-wakeword-dev.csv,
/datasets/mixed_portal_backups_14_16_17_18_19_visteon_wakeword_dataset-dev.csv,
/datasets/alexa-dev.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_dev.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_dev.csv,
/datasets/india_portal_ww_data_05222020/custom_dev.csv,
/datasets/ww_gtts_data_google_siri/custom_dev.csv,
/datasets/ww_polly_data_google_siri/custom_dev.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_dev.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_dev.csv,
/datasets/cv-corpus-5.1-2020-06-22/dev.csv
–test_files /datasets/ww_test_aggregated.csv,
/datasets/alexa-train.csv,
/datasets/alexa-polly-train.csv,
/datasets/alexa-sns.csv,
/datasets/alexa-dev.csv,
/datasets/india_portal_ww_data_04282020/custom_train.csv,
/datasets/india_portal_ww_data_05042020/custom_train.csv,
/datasets/india_portal_ww_data_04282020/custom_dev.csv,
/datasets/india_portal_ww_data_05042020/custom_dev.csv,
/datasets/india_portal_ww_data_04282020/custom_test.csv,
/datasets/india_portal_ww_data_05042020/custom_test.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_train.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_dev.csv,
/datasets/india_portal_ww_data_augmented_04282020/custom_test.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_train.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_dev.csv,
/datasets/india_portal_ww_data_augmented_05042020/custom_test.csv

I Could not find best validating checkpoint.
I Loading most recent checkpoint from /deepspeech_v091/checkpoints/run_training_0.9.1_ww_allww/train-127025
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
I Training epoch 0...
I Finished training epoch 0 - loss: 86.905893
I Validating epoch 0 on /datasets/deepspeech_wakeword_dataset/wakeword-dev.csv...
I Finished validating epoch 0 on /datasets/deepspeech_wakeword_dataset/wakeword-dev.csv - loss: 24.105105
I Validating epoch 0 on /datasets/india_portal_2may2019-dev.csv...
I Finished validating epoch 0 on /datasets/india_portal_2may2019-dev.csv - loss: 139.901111
I Validating epoch 0 on /datasets/india_portal_2to9may2019-dev.csv...
I Finished validating epoch 0 on /datasets/india_portal_2to9may2019-dev.csv - loss: 138.826761
I Validating epoch 0 on /datasets/india_portal_9to19may2019-dev.csv...
I Finished validating epoch 0 on /datasets/india_portal_9to19may2019-dev.csv - loss: 154.308012
I Validating epoch 0 on /datasets/india_portal_19to24may2019-dev.csv...
I Finished validating epoch 0 on /datasets/india_portal_19to24may2019-dev.csv - loss: 135.020843
I Validating epoch 0 on /datasets/brazil_portal_20to26june2019-wakeword-dev.csv...
I Finished validating epoch 0 on /datasets/brazil_portal_20to26june2019-wakeword-dev.csv - loss: 29.007656
I Validating epoch 0 on /datasets/brazil_portal_26juneto3july2019-wakeword-dev.csv...
I Finished validating epoch 0 on /datasets/brazil_portal_26juneto3july2019-wakeword-dev.csv - loss: 48.383508
I Validating epoch 0 on /datasets/mixed_portal_backups_14_16_17_18_19_visteon_wakeword_dataset-dev.csv...
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
	 [[{{node tower_0/IteratorGetNext}}]]
	 [[tower_0/CTCLoss/_193]]
  (1) Out of range: End of sequence
	 [[{{node tower_0/IteratorGetNext}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/DeepSpeech/training/deepspeech_training/train.py", line 570, in run_set
    feed_dict=feed_dict)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
	 [[node tower_0/IteratorGetNext (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[tower_0/CTCLoss/_193]]
  (1) Out of range: End of sequence
	 [[node tower_0/IteratorGetNext (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) ]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'tower_0/IteratorGetNext':
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/DeepSpeech/training/deepspeech_training/train.py", line 976, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/DeepSpeech/training/deepspeech_training/train.py", line 948, in main
    train()
  File "/DeepSpeech/training/deepspeech_training/train.py", line 483, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/DeepSpeech/training/deepspeech_training/train.py", line 316, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "/DeepSpeech/training/deepspeech_training/train.py", line 235, in calculate_mean_edit_distance_and_loss
    batch_filenames, (batch_x, batch_seq_len), batch_y = iterator.get_next()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/DeepSpeech/training/deepspeech_training/train.py", line 976, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/DeepSpeech/training/deepspeech_training/train.py", line 948, in main
    train()
  File "/DeepSpeech/training/deepspeech_training/train.py", line 615, in train
    set_loss, steps = run_set('dev', epoch, init_op, dataset=source)
  File "/DeepSpeech/training/deepspeech_training/train.py", line 573, in run_set
    exception_box.raise_if_set()
  File "/DeepSpeech/training/deepspeech_training/util/helpers.py", line 123, in raise_if_set
    raise exception  # pylint: disable = raising-bad-type
  File "/DeepSpeech/training/deepspeech_training/util/helpers.py", line 131, in do_iterate
    yield from iterable()
  File "/DeepSpeech/training/deepspeech_training/util/feeding.py", line 114, in generate_values
    for sample_index, sample in enumerate(samples):
  File "/DeepSpeech/training/deepspeech_training/util/augmentations.py", line 221, in apply_sample_augmentations
    yield from pool.imap(_augment_sample, timed_samples())
  File "/DeepSpeech/training/deepspeech_training/util/helpers.py", line 102, in imap
    for obj in self.pool.imap(fun, self._limit(it)):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
    raise value
wave.Error: fmt chunk and/or data chunk missing

Didn’t you write over in the other post that you check the input wavs. There seem to be some problematic files left.

Check that all wavs are readable, then check that the lenghts you give in the csv are correct. If all else fails, try to use the reverse and limit flags to find the files.

No, it was not me who wrote in the other post. I have used the same files for training with deepspeech 0.6.1. But, like you suggested I wrote a script (it’s pasted below) which opens all the files and it ran without any errors. I don’t understand what you mean by using reverse and limit flags to find the files.

def custom_cv_writer(source_dir):
	source_dir = path.abspath(source_dir)

	fin_train = open(source_dir + '/mixed_portal_backups_14_16_17_18_19_visteon_wakeword_dataset-dev.csv')

	for line in fin_train:
		split_line = line.split(',')
		#print(split_line)
		wav_path = split_line[0]
		wav_size = split_line[1]  # Whatever columns
		transcript = split_line[2]  # Whatever columns

		if wav_size == 'wav_filesize':
			continue

		samplerate, data = wavfile.read(source_dir + '/' + wav_path)
		print('no problem ' + source_dir + '/' + wav_path)
            
if __name__ == "__main__":
    custom_cv_writer(sys.argv[1])

I got it, I used wave.open() and that helped me find out the file

Sorry for mixing up names.

Great you found the corrupt files.