KeyError: 'wav_filename'

When trying to fine tune the latest deepspeech model It will initialize the training but throw 3 errors and at the end says “KeyError: ‘wav_filename’” eventhough in all of my csv files I have the required header. Any pointers? Also when it starts the training it says that “I Could not find best validating checkpoint.
I Could not find most recent checkpoint.”
Thanks

Check the csvs, usually you have the header twice in there from copying or sth. Start with just one file in each file and check that the setup is working.

If you don’t have any checkpoints yet, don’t worry, they’ll be created.

1 Like

@othiele Thank you for reaching out! I did the steps that you reccomended and I’m still getting the error. Is there anything I have to set prior to running the script like setting the .csv locations besides noting it as a flag? Here’s the command I’m using to run:
python3 -u DeepSpeech.py \

–train_files ./data/train/train.csv \

–dev_files ./data/dev/dev.csv \

–test_files ./data/test/test.csv \

–train_batch_size 7 \

–dev_batch_size 3 \

–test_batch_size 1 \

–n_hidden 1024 \

–epochs 64 \

–early_stop True \

–dropout_rate 0.30 \

–learning_rate 0.0005 \

–export_dir ./results/model_export/ \

–checkpoint_dir ./results/checkout/ \

–alphabet_config_path ./data/alphabet.txt \

“$@”

Are you using the most recent Common Voice dataset? There was an issue with some of the column data so they rebuilt the dataset. Try downloading it again from the site.

Looks good but share some logs, it is unclear whether training starts and post first couple of line of train.csv

@othiele @dabinat
Here’s my train.csv

wav_filename,wav_filesize,transcript
/home/wes/DeepSpeech/data/train/Eggs.wav,89804,tell me about your eczema 
/home/wes/DeepSpeech/data/train/HiAlan.wav,589367,hi alan my name is wesley
/home/wes/DeepSpeech/data/train/ObtainPermission.wav,173930,is it alright if we talk in this non clinical setting
/home/wes/DeepSpeech/data/train/Religious.wav,135198,do you have any religious or spiritual affiliations
/home/wes/DeepSpeech/data/train/TalkAboutIBS.wav,589367,tell me about your ibs
/home/wes/DeepSpeech/data/train/ProbsWithIBS.wav,80664,what problems have you been having with your ibs
/home/wes/DeepSpeech/data/train/EczemaFlareUps.wav,73976,please describe you eczema flare ups  

Then here’s what it says in the log when I run the script

+ python3 -u DeepSpeech.py --train_files ./data/train/train.csv --dev_files ./data/dev/dev.csv --test_files ./data/test/test.csv --train_batch_size 7 --dev_batch_size 3 --test_batch_size 1 --n_hidden 1024 --epochs 64 --early_stop True --dropout_rate 0.30 --learning_rate 0.0005 --export_dir ./results/model_export/ --checkpoint_dir ./results/checkout/ --alphabet_config_path ./data/alphabet.txt
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                                                                 Traceback (most recent call last):
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
         [[{{node tower_0/IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 560, in run_set
    feed_dict=feed_dict)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
         [[node tower_0/IteratorGetNext (defined at /home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'tower_0/IteratorGetNext':
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 955, in run_script
    absl.app.run(main)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 927, in main
    train()
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 473, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 312, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 231, in calculate_mean_edit_distance_and_loss
    batch_filenames, (batch_x, batch_seq_len), batch_y = iterator.get_next()
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 955, in run_script
    absl.app.run(main)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 927, in main
    train()
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 595, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "/home/wes/DeepSpeech/training/deepspeech_training/train.py", line 563, in run_set
    exception_box.raise_if_set()
  File "/home/wes/DeepSpeech/training/deepspeech_training/util/helpers.py", line 124, in raise_if_set
    raise exception  # pylint: disable = raising-bad-type
  File "/home/wes/DeepSpeech/training/deepspeech_training/util/helpers.py", line 132, in do_iterate
    yield from iterable()
  File "/home/wes/DeepSpeech/training/deepspeech_training/util/feeding.py", line 102, in generate_values
    samples = samples_from_sources(sources, buffering=buffering, labeled=True)
  File "/home/wes/DeepSpeech/training/deepspeech_training/util/sample_collections.py", line 414, in samples_from_sources
    return samples_from_source(sample_sources[0], buffering=buffering, labeled=labeled)
  File "/home/wes/DeepSpeech/training/deepspeech_training/util/sample_collections.py", line 385, in samples_from_source
    return CSV(sample_source, labeled=labeled)
  File "/home/wes/DeepSpeech/training/deepspeech_training/util/sample_collections.py", line 349, in __init__
    wav_filename = Path(row['wav_filename'])
KeyError: 'wav_filename'

Please use better formatting, check some other posts. This can look good, this is just a mess

Sadly I do not have a capable computing environment to download and train with CommonVoice. I am hoping to just fine-tune the pre-trained english model to better recognize a few key words

@othiele Here’s my train.csv

wav_filename,wav_filesize,transcript
/home/wes/DeepSpeech/data/train/Eggs.wav,89804,tell me about your eczema
/home/wes/DeepSpeech/data/train/HiAlan.wav,589367,hi alan my name is wesley
/home/wes/DeepSpeech/data/train/ObtainPermission.wav,173930,is it alright if we talk in this non clinical setting
/home/wes/DeepSpeech/data/train/Religious.wav,135198,do you have any religious or spiritual affiliations
/home/wes/DeepSpeech/data/train/TalkAboutIBS.wav,589367,tell me about your ibs
/home/wes/DeepSpeech/data/train/ProbsWithIBS.wav,80664,what problems have you been having with your ibs
/home/wes/DeepSpeech/data/train/EczemaFlareUps.wav,73976,please describe you eczema flare ups

Then here’s what it says in the log when I run the script

* python3 -u DeepSpeech.py --train_files ./data/train/train.csv --dev_files ./data/dev/dev.csv --test_files ./data/test/test.csv --train_batch_size 7 --dev_batch_size 3 --test_batch_size 1 --n_hidden 1024 --epochs 64 --early_stop True --dropout_rate 0.30 --learning_rate 0.0005 --export_dir ./results/model_export/ --checkpoint_dir ./results/checkout/ --alphabet_config_path ./data/alphabet.txt
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Traceback (most recent call last):
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1365, in _do_call
return fn(*args)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1350, in _run_fn
target_list, run_metadata)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1443, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[{{node tower_0/IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 560, in run_set
feed_dict=feed_dict)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 956, in run
run_metadata_ptr)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1180, in _run
feed_dict_tensor, options, run_metadata)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1359, in _do_run
run_metadata)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/client/session.py”, line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[node tower_0/IteratorGetNext (defined at /home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for ‘tower_0/IteratorGetNext’:
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 955, in run_script
absl.app.run(main)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 927, in main
train()
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 473, in train
gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 312, in get_tower_results
avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 231, in calculate_mean_edit_distance_and_loss
batch_filenames, (batch_x, batch_seq_len), batch_y = iterator.get_next()
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/data/ops/iterator_ops.py”, line 426, in get_next
name=name)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py”, line 2518, in iterator_get_next
output_shapes=output_shapes, name=name)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py”, line 794, in _apply_op_helper
op_def=op_def)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py”, line 507, in new_func
return func(*args, **kwargs)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py”, line 3357, in create_op
attrs, op_def, compute_device)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py”, line 3426, in _create_op_internal
op_def=op_def)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py”, line 1748, in **init**
self._traceback = tf_stack.extract_stack()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 955, in run_script
absl.app.run(main)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/wes/tmp/deepspeech-train-venv/lib/python3.6/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 927, in main
train()
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 595, in train
train_loss, _ = run_set(‘train’, epoch, train_init_op)
File “/home/wes/DeepSpeech/training/deepspeech_training/train.py”, line 563, in run_set
exception_box.raise_if_set()
File “/home/wes/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 124, in raise_if_set
raise exception # pylint: disable = raising-bad-type
File “/home/wes/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 132, in do_iterate
yield from iterable()
File “/home/wes/DeepSpeech/training/deepspeech_training/util/feeding.py”, line 102, in generate_values
samples = samples_from_sources(sources, buffering=buffering, labeled=True)
File “/home/wes/DeepSpeech/training/deepspeech_training/util/sample_collections.py”, line 414, in samples_from_sources
return samples_from_source(sample_sources[0], buffering=buffering, labeled=labeled)
File “/home/wes/DeepSpeech/training/deepspeech_training/util/sample_collections.py”, line 385, in samples_from_source
return CSV(sample_source, labeled=labeled)
File “/home/wes/DeepSpeech/training/deepspeech_training/util/sample_collections.py”, line 349, in **init**
wav_filename = Path(row[‘wav_filename’])
KeyError: ‘wav_filename’

What about the other files?

That’s weird, first time I see that.

test.csv

wav_filename,wav_filesize,transcript
/home/wes/DeepSpeech/data/test/Difficult.wav,73604,your ibs must be difficult to deal with

dev.csv

wav_filename,wav_filesize,transcript
/home/wes/DeepSpeech/data/dev/ManageIBS.wav,70632,what do you do to manage your ibs
/home/wes/DeepSpeech/data/dev/Hygiene.wav,58744,let's talk about the importance of hygiene

Thanks for the formatting, much better.

Could it be that you have just a couple lines in each csv? Then use batch size 1 as it tries to use groups of batch size. If you don’t have that many you get strange error msgs.

1 Like

How large do you recommend the csv files be? As in how many more wav files should I add?

All of them. A typical set has hundreds of thousands for train. Splits can be 80/10/10 percent, but search for split here in the forum.

But was that your error?

I’ll try adding more files. The main goal though is to fine-tune a pre-trained english model to learn and recognize a few key words such as “eczema” and “IBS”

@othiele Is there an example of a csv file that you know works so that I can use and just add my own stuff?

This is not helpful of you. This forum is meant to help you and others in the future.

You didn’t state what you changed and whether that worked. You simply ask questions that feel unrelated. Sorry, can’t help you that way. Search the forum first before asking more questions.

@othiele I’ve tried adding more lines to my csv and setting the batch size to 1, still getting the same. I have also tried running it with CSVs that just have 1 line which also didnt work. I’ve researched this issue on this forum and on other resources and each one has said that its a formatting issue. I really don’t understand whats wrong with my csv formatting.

Please start and verify from ./bin/run-ldc93s1.sh

1 Like

@lissyx is right, start somewhere and debug every error you encounter. Your csv-file looks fine, we can’t help you because you don’t give as any information. What OS, the csv you actually used and so much more … did you even look at

https://discourse.mozilla.org/t/what-and-how-to-report-if-you-need-support/62071/2

1 Like