Getting Blank CSV files in the clips folder from TSV Files when trying through import_cv2.py command

Hi this command :

python bin/import_cv2.py --filter_alphabet ./20140421/path/to/some/alphabet.txt ./20140421/scripts/Ib/

is giving me blank csv files in the clips folder and hence I am not able to train the files. I am hereby attaching all the tsv files which are there in “./20140421/scripts/Ib/” folder location.

I am able to convert all the mp3’s into its associated wavs though.

Kindly let me know where am I going wrong, as the TSV files were directly downloaded from the common voice irish database.

tsv_files.zip (133.0 KB)

Please:

  • language
  • release
  • output of your run
  • deepspeech clone ref

Language : Irish

Release : ga-IE_4h_2019-12-10

DeepSpeech Clone Ref : ‘git clone https://github.com/mozilla/DeepSpeech

Output of Run:
python bin/import_cv2.py --filter_alphabet ./20140421/path/to/some/alphabet.txt ./20140421/scripts/Ib/ :

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/train.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/train.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/train.csv
Imported 0 samples.
Skipped 555 samples that failed on transcript validation.
Final amount of imported audio: 0:27:58.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/test.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/test.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/test.csv
Imported 0 samples.
Skipped 486 samples that failed on transcript validation.
Final amount of imported audio: 0:32:41.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/dev.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/dev.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/dev.csv
Imported 0 samples.
Skipped 444 samples that failed on transcript validation.
Final amount of imported audio: 0:24:55.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/validated.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/validated.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/validated.csv
Imported 0 samples.
Skipped 2717 samples that failed on transcript validation.
Final amount of imported audio: 2:29:27.
Loading TSV file: /home/ritish/DeepSpeech/DeepSpeech/20140421/scripts/Ib/other.tsv
Saving new DeepSpeech-formatted CSV file to: ./20140421/scripts/Ib/clips/other.csv
Importing mp3 files…
Writing CSV file for DeepSpeech.py as: ./20140421/scripts/Ib/clips/other.csv
Imported 0 samples.
Skipped 1166 samples that failed on transcript validation.
Final amount of imported audio: 1:15:16.

Please help.

Please read your output …

So all your samples gets rejected by your filter selection ?

Are you sure about what you are doing here ?

Same, are you sure you are not missing --validate_label_locale ?

https://deepspeech.readthedocs.io/en/master/search.html?q=validate_label_locale&check_keywords=yes&area=default

Okay, so I remove –filter_alphabet ./20140421/path/to/some/alphabet.txt and I get the valid csv Output.

But when I run this :

python DeepSpeech.py --train_files ./20140421/scripts/Ib/clips/train.csv --dev_files ./20140421/scripts/Ib/clips/dev.csv --test_files ./20140421/scripts/Ib/clips/test.csv , I get the following error :

I Could not find best validating checkpoint.
I Loading most recent checkpoint from /home/ritish/.local/share/deepspeech/checkpoints/train-0
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam_1
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 0:00:00 | Steps: 0 | Loss: 0.000000 Traceback (most recent call last):
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 30, in _label_from_string
return self._str_to_label[string]
KeyError: ‘é’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 128, in text_to_char_array
transcript = alphabet.encode(transcript)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 44, in encode
res.append(self._label_from_string(char))
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 36, in _label_from_string
).with_traceback(e.traceback)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 30, in _label_from_string
return self._str_to_label[string]
KeyError: “ERROR: Your transcripts contain characters (e.g. ‘é’) which do not occur in ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’.”

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “DeepSpeech.py”, line 12, in
ds_train.run_script()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 942, in run_script
absl.app.run(main)
File “/home/ritish/.local/lib/python3.7/site-packages/absl/app.py”, line 299, in run
_run_main(main, args)
File “/home/ritish/.local/lib/python3.7/site-packages/absl/app.py”, line 250, in _run_main
sys.exit(main(argv))
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 914, in main
train()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 592, in train
train_loss, _ = run_set(‘train’, epoch, train_init_op)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/train.py”, line 553, in run_set
exception_box.raise_if_set()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 117, in raise_if_set
raise exception # pylint: disable = raising-bad-type
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/helpers.py”, line 125, in do_iterate
yield from iterable()
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/feeding.py”, line 123, in generate_values
transcript = text_to_char_array(sample.transcript, Config.alphabet, context=sample.sample_id)
File “/home/ritish/DeepSpeech/DeepSpeech/training/deepspeech_training/util/text.py”, line 136, in text_to_char_array
raise ValueError(‘While processing: {}\n{}’.format(context, e))
ValueError: While processing: 20140421/scripts/Ib/clips/common_voice_ga-IE_17638591.wav
“ERROR: Your transcripts contain characters (e.g. ‘é’) which do not occur in ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’! Use util/check_characters.py to see what characters are in your [train,dev,test].csv transcripts, and then add all these to ‘/home/ritish/DeepSpeech/DeepSpeech/data/alphabet.txt’.”

Please help me in getting this fixed. I am not sure whether the filter command is necessary or not.

Again, if you just read the error message, it’s telling you exactly what to do.

Have you read the documentation?
Have you read the code you are using?
Have you generated a valid alphabet file?