Getting Blank CSV files in the clips folder from TSV Files when trying through import_cv2.py command

ritishadhikari · April 13, 2020, 12:26pm

Hi this command :

python bin/import_cv2.py --filter_alphabet ./20140421/path/to/some/alphabet.txt ./20140421/scripts/Ib/

is giving me blank csv files in the clips folder and hence I am not able to train the files. I am hereby attaching all the tsv files which are there in “./20140421/scripts/Ib/” folder location.

I am able to convert all the mp3’s into its associated wavs though.

Kindly let me know where am I going wrong, as the TSV files were directly downloaded from the common voice irish database.

tsv_files.zip (133.0 KB)

lissyx · April 13, 2020, 3:04pm

Please:

language
release
output of your run
deepspeech clone ref

ritishadhikari · April 13, 2020, 5:42pm

Language : Irish

Release : ga-IE_4h_2019-12-10

DeepSpeech Clone Ref : ‘git clone https://github.com/mozilla/DeepSpeech’

Output of Run:
python bin/import_cv2.py --filter_alphabet ./20140421/path/to/some/alphabet.txt ./20140421/scripts/Ib/ :

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/train.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/train.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/train.csv
Imported 0 samples.
Skipped 555 samples that failed on transcript validation.
Final amount of imported audio: 0:27:58.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/test.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/test.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/test.csv
Imported 0 samples.
Skipped 486 samples that failed on transcript validation.
Final amount of imported audio: 0:32:41.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/dev.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/dev.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/dev.csv
Imported 0 samples.
Skipped 444 samples that failed on transcript validation.
Final amount of imported audio: 0:24:55.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/validated.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/validated.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/validated.csv
Imported 0 samples.
Skipped 2717 samples that failed on transcript validation.
Final amount of imported audio: 2:29:27.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/other.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/other.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/other.csv
Imported 0 samples.
Skipped 1166 samples that failed on transcript validation.
Final amount of imported audio: 1:15:16.

Please help.

lissyx · April 13, 2020, 5:45pm

Please read your output …

So all your samples gets rejected by your filter selection ?

Are you sure about what you are doing here ?

Same, are you sure you are not missing --validate_label_locale ?

lissyx · April 13, 2020, 5:46pm

https://deepspeech.readthedocs.io/en/master/search.html?q=validate_label_locale&check_keywords=yes&area=default

ritishadhikari · April 13, 2020, 5:56pm

Okay, so I remove –filter_alphabet ./20140421/path/to/some/alphabet.txt and I get the valid csv Output.

But when I run this :

python DeepSpeech.py --train_files ./20140421/scripts/Ib/clips/train.csv --dev_files ./20140421/scripts/Ib/clips/dev.csv --test_files ./20140421/scripts/Ib/clips/test.csv , I get the following error :

I Could not find best validating checkpoint.
I Loading most recent checkpoint from /home/ritish/.local/share/deepspeech/checkpoints/train-0
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Traceback (most recent call last):
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 30, in _label_from_string
return self._str_to_label[string]
KeyError: ‘é’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 128, in text_to_char_array
transcript = alphabet.encode(transcript)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 44, in encode
res.append(self._label_from_string(char))
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 36, in _label_from_string
).with_traceback(e.traceback)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 30, in _label_from_string
return self._str_to_label[string]
KeyError: “ERROR: Your transcripts contain characters (e.g. ‘é’) which do not occur in ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’.”

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 942, in run_script
absl.app.run(main)
File “/home/ritish/.local/lib/python3.7/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/ritish/.local/lib/python3.7/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 914, in main
train()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 592, in train
train_loss, _ = run_set(‘train’, epoch, train_init_op)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 553, in run_set
exception_box.raise_if_set()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 117, in raise_if_set
raise exception # pylint: disable = raising-bad-type
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 125, in do_iterate
yield from iterable()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/feeding.py”, line 123, in generate_values
transcript = text_to_char_array(sample.transcript, Config.alphabet, context=sample.sample_id)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 136, in text_to_char_array
raise ValueError(‘While processing: {}\n{}’.format(context, e))
ValueError: While processing: 20140421/scripts/Ib/clips/common_voice_ga-IE_17638591.wav
“ERROR: Your transcripts contain characters (e.g. ‘é’) which do not occur in ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’.”

Please help me in getting this fixed. I am not sure whether the filter command is necessary or not.

reuben · April 13, 2020, 8:30pm

Again, if you just read the error message, it’s telling you exactly what to do.

lissyx · April 13, 2020, 11:03pm

Have you read the documentation?
Have you read the code you are using?
Have you generated a valid alphabet file?

Topic		Replies	Views
Bin/import_cv2.py seems broken DeepSpeech	34	1518	March 30, 2021
Bin/import_cv2.py import 0 samples of CommonVoice ga-IE DeepSpeech	3	630	November 2, 2019
Getting RuntimeError: No transcript data (missing CSV column) when trying to train a model DeepSpeech	12	1551	April 12, 2020
Problem when training my own model DeepSpeech	6	1367	November 2, 2020
Import_cv2 : all files failed to convert DeepSpeech	7	923	July 27, 2019

Getting Blank CSV files in the clips folder from TSV Files when trying through import_cv2.py command

Related topics