Import_cv2 : all files failed to convert

pvk444 · July 25, 2019, 8:29pm

I’m using the latest DeepSpeech git clone on Ubuntu 18.04, and have downloaded Common Voice 2, as required.

When running

bin/import_cv2

the program correctly finds the *.tsv files and clip folders, but then reports

Final amount of imported audio: 0

and all files were skipped due to failing upon conversion. No other errors are reported.

Any help would be greatly appreciated.

lissyx · July 26, 2019, 6:20am

Can you share log, so we can have a look ?

pvk444 · July 26, 2019, 7:01am

Here is the terminal output. Is there a log file as well somewhere? Did not find any reference looking through import_cv2.py

(nlp) orchestrate@gpurig:~/projects/DeepSpeech$ bin/import_cv2.py --filter_alphabet /home/orchestrate/projects/DeepSpeech/data/alphabet.txt /home/orchestrate/projects/corpora/common_voice_2
Loading TSV file: /home/orchestrate/projects/corpora/common_voice_2/train.tsv
Saving new DeepSpeech-formatted CSV file to: /home/orchestrate/projects/corpora/common_voice_2/clips/train.csv
Importing mp3 files…
Progress |################################################################################################################################################################################## | 99% completedWriting CSV file for DeepSpeech.py as: /home/orchestrate/projects/corpora/common_voice_2/clips/train.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 63330 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/orchestrate/projects/corpora/common_voice_2/test.tsv
Saving new DeepSpeech-formatted CSV file to: /home/orchestrate/projects/corpora/common_voice_2/clips/test.csv
Importing mp3 files…
Progress |###################################################################################################################################################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/orchestrate/projects/corpora/common_voice_2/clips/test.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 13178 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/orchestrate/projects/corpora/common_voice_2/dev.tsv
Saving new DeepSpeech-formatted CSV file to: /home/orchestrate/projects/corpora/common_voice_2/clips/dev.csv
Importing mp3 files…
Progress |###################################################################################################################################################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/orchestrate/projects/corpora/common_voice_2/clips/dev.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 13178 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Progress |###################################################################################################################################################################################| 100% completed
Progress |###################################################################################################################################################################################| 100% completed
Progress |###################################################################################################################################################################################| 100% completed
(nlp) orchestrate@gpurig:~/projects/DeepSpeech$

lissyx · July 26, 2019, 7:09am

How fast does it completes ?

pvk444 · July 26, 2019, 7:56am

With an Intel Xeon CPU E5-2650 v2 @ 2.60GHz CPU and 64 GB RAM the entire import_cv2 run takes 825.22 seconds

reuben · July 26, 2019, 8:54am

import_cv2.py is masking the real error. Try removing the try block here so you can see the actual problem:

github.com

mozilla/DeepSpeech/blob/daa6167829e7eee45f22ef21f81b24d36b664f7a/bin/import_cv2.py#L134-L137


try:
    transformer.build(mp3_filename, wav_filename)
except sox.core.SoxError:
    pass

pvk444 · July 26, 2019, 11:16am

This is really odd: it works now. Because of some other challenges, I had to reinstall SWIG and rebuild ctcdecoder in parallel to updating / running import_cv2. This seems to have “unblocked” something (what, I can’t tell), and it works now as expected.

Thanks for the help reuben and lissyx.

eggonlea · July 27, 2019, 5:36pm

Most likely you fixed the sox package at the same time.