Getting RuntimeError: No transcript data (missing CSV column) when trying to train a model

ritishadhikari · April 9, 2020, 2:33pm

When I am trying to run the below command to train a dataset from common voice:

python DeepSpeech.py --train_files ./20140421/scripts/Ib/clips/train.tsv --dev_files ./20140421/scripts/Ib/clips/dev.csv --test_files ./20140421/scripts/Ib/clips/test.csv

I am getting the below error:

 STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                                                                Traceback (most recent call last):
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
	 [[{{node tower_0/IteratorGetNext}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 552, in run_set
    feed_dict=feed_dict)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
	 [[node tower_0/IteratorGetNext (defined at /home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Original stack trace for 'tower_0/IteratorGetNext':
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 942, in run_script
    absl.app.run(main)
  File "/home/ritish/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/ritish/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 914, in main
    train()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 474, in train
    gradients, loss, non_finite_files = get_tower_results(iterator, optimizer, dropout_rates)
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 312, in get_tower_results
    avg_loss, non_finite_files = calculate_mean_edit_distance_and_loss(iterator, dropout_rates, reuse=i > 0)
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 231, in calculate_mean_edit_distance_and_loss
    batch_filenames, (batch_x, batch_seq_len), batch_y = iterator.get_next()
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/home/ritish/.local/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 942, in run_script
    absl.app.run(main)
  File "/home/ritish/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/ritish/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 914, in main
    train()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 592, in train
    train_loss, _ = run_set('train', epoch, train_init_op)
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 560, in run_set
    exception_box.raise_if_set()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/helpers.py", line 117, in raise_if_set
    raise exception  # pylint: disable = raising-bad-type
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/helpers.py", line 125, in do_iterate
    yield from iterable()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/feeding.py", line 119, in generate_values
    samples = samples_from_files(sources, buffering=buffering, labeled=True)
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/sample_collections.py", line 363, in samples_from_files
    return samples_from_file(filenames[0], buffering=buffering, labeled=labeled)
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/sample_collections.py", line 338, in samples_from_file
    return CSV(filename, labeled=labeled)
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/sample_collections.py", line 288, in __init__
    raise RuntimeError('No transcript data (missing CSV column)')

I have placed all the csv files in the same clips folder where all the mp3 files are stored. The files were converted from tsv to csv through python and have the following headers along with the first two rows:

|client_id|path|sentence|up_votes|down_votes|age|gender|accent|
|---|---|---|---|---|---|---|---|
|181b63f0202ba1fd0594b5a55c4a9bb53429c87b2d2bda74f0370273a94bcffeaf810c3b0838481915f9dfbe59011ea40c5f9281b6a858c7f60f4a644c148d7f|common_voice_ga-IE_18183675.mp3|Gura fada buan sibh agus go raibh míle maith agaibh go léir|2|0|twenties|male|connachta|
|181b63f0202ba1fd0594b5a55c4a9bb53429c87b2d2bda74f0370273a94bcffeaf810c3b0838481915f9dfbe59011ea40c5f9281b6a858c7f60f4a644c148d7f|common_voice_ga-IE_18183677.mp3|Is í ding di féin a scoileann an dair|2|1|twenties|male|connachta|

I have taken the data from the following link:
https://voice.mozilla.org/en/datasets

Kindly let me know, where I am erring.

lissyx · April 9, 2020, 2:42pm

Please use import_cv2.py as documented: Training Your Own Model — DeepSpeech 0.6.1 documentation

ritishadhikari · April 9, 2020, 2:47pm

I had already used that line,

python bin/import_cv2.py --filter_alphabet 20140421/path/to/some/alphabet.txt 20140421/path/to/extracted/language/archive

but I got this response -

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.

lissyx · April 9, 2020, 2:49pm

And ?

This is a documented warning to ensure you know what you are doing. But import will work …

ritishadhikari · April 9, 2020, 2:50pm

Okay, so that line worked. But then why am I getting the error when I have run the code:

python DeepSpeech.py --train_files ./20140421/scripts/Ib/clips/train.csv --dev_files ./20140421/scripts/Ib/clips/dev.csv --test_files ./20140421/scripts/Ib/clips/test.csv

Please suggest

lissyx · April 9, 2020, 2:53pm

Please be clear. You shared CSV content that is NOT what the training code expects.
Also, you command line shared earlier mentions train.tsv instead of train.csv.

So far, it seems obvious you are still passing wrong data.

lissyx · April 9, 2020, 2:54pm

So, what error does that line produces ? Can you share standard output of bin/import_cv2.py ? Have you verified the .csv files ? What language is this ?

ritishadhikari · April 9, 2020, 3:00pm

This is Irish Language.

When I am using bin/import_cv2.py, then I am getting the following error -

import_cv2.py: error: the following arguments are required: tsv_dir

ritishadhikari · April 9, 2020, 3:00pm

usage: import_cv2.py [-h] [–validate_label_locale VALIDATE_LABEL_LOCALE]
[–audio_dir AUDIO_DIR]
[–filter_alphabet FILTER_ALPHABET] [–normalize]
[–space_after_every_character]
tsv_dir
import_cv2.py: error: the following arguments are required: tsv_dir

lissyx · April 9, 2020, 3:13pm

Well, pass the argument ?

ritishadhikari · April 10, 2020, 7:26am

Thanks for your reply, after I have passed the arguments, and have run the same, and once I am running the deepspeech.py command, I am getting a zero division error :

(deepspeech) ritish@ritish-VirtualBox:~/DeepSpeech/DeepSpeech$ python DeepSpeech.py --train_files ./20140421/scripts/Ib/clips/train.csv --dev_files ./20140421/scripts/Ib/clips/dev.csv --test_files ./20140421/scripts/Ib/clips/test.csv
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000                                                                                
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 | Dataset: ./20140421/scripts/Ib/clips/dev.csv                                 
Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 942, in run_script
    absl.app.run(main)
  File "/home/ritish/.local/lib/python3.7/site-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/home/ritish/.local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 914, in main
    train()
  File "/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py", line 607, in train
    dev_loss = dev_loss / total_steps
ZeroDivisionError: float division by zero

Please let me know where I am erring. Your response is solicited.

lissyx · April 10, 2020, 7:48am

Verify your dev.csv file, either it’s empty or too small regarding batch size.

ritishadhikari · April 12, 2020, 2:46pm

tsv_files.zip (133.0 KB)

Hi this command :

python bin/import_cv2.py --filter_alphabet ./20140421/path/to/some/alphabet.txt ./20140421/scripts/Ib/

is giving me blank csv files in the clips folder and hence I am not able to train the files. I am hereby attaching all the tsv files which are there in “./20140421/scripts/Ib/” folder location.

Kindly let me know where am I going wrong, as the TSV files were directly downloaded from the common voice irish database.

Topic		Replies	Views
KeyError: 'wav_filename' DeepSpeech	19	1602	July 21, 2020
Training Common Voice issue: Invalid argument: Labels length is zero in batch 0 DeepSpeech	9	3832	June 30, 2018
Hindi accent using deepspeech DeepSpeech	98	3289	November 25, 2019
Error when trying to train DeepSpeech	7	1888	January 30, 2018
Error while training the model DeepSpeech	2	311	March 12, 2020

Getting RuntimeError: No transcript data (missing CSV column) when trying to train a model

Related topics