Import_cv2 : all files failed to convert

I’m using the latest DeepSpeech git clone on Ubuntu 18.04, and have downloaded Common Voice 2, as required.

When running

bin/import_cv2

the program correctly finds the *.tsv files and clip folders, but then reports

Final amount of imported audio: 0

and all files were skipped due to failing upon conversion. No other errors are reported.

Any help would be greatly appreciated.

Can you share log, so we can have a look ?

Here is the terminal output. Is there a log file as well somewhere? Did not find any reference looking through import_cv2.py

(nlp) orchestrate@gpurig:~/projects/DeepSpeech$ bin/import_cv2.py --filter_alphabet /home/orchestrate/projects/DeepSpeech/data/alphabet.txt /home/orchestrate/projects/corpora/common_voice_2
Loading TSV file: /home/orchestrate/projects/corpora/common_voice_2/train.tsv
Saving new DeepSpeech-formatted CSV file to: /home/orchestrate/projects/corpora/common_voice_2/clips/train.csv
Importing mp3 files…
Progress |################################################################################################################################################################################## | 99% completedWriting CSV file for DeepSpeech.py as: /home/orchestrate/projects/corpora/common_voice_2/clips/train.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 63330 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/orchestrate/projects/corpora/common_voice_2/test.tsv
Saving new DeepSpeech-formatted CSV file to: /home/orchestrate/projects/corpora/common_voice_2/clips/test.csv
Importing mp3 files…
Progress |###################################################################################################################################################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/orchestrate/projects/corpora/common_voice_2/clips/test.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 13178 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Loading TSV file: /home/orchestrate/projects/corpora/common_voice_2/dev.tsv
Saving new DeepSpeech-formatted CSV file to: /home/orchestrate/projects/corpora/common_voice_2/clips/dev.csv
Importing mp3 files…
Progress |###################################################################################################################################################################################| 100% completedWriting CSV file for DeepSpeech.py as: /home/orchestrate/projects/corpora/common_voice_2/clips/dev.csv
Progress |# | 100% completed
Imported 0 samples.
Skipped 13178 samples that failed upon conversion.
Final amount of imported audio: 0:00:00.
Progress |###################################################################################################################################################################################| 100% completed
Progress |###################################################################################################################################################################################| 100% completed
Progress |###################################################################################################################################################################################| 100% completed
(nlp) orchestrate@gpurig:~/projects/DeepSpeech$

How fast does it completes ?

With an Intel Xeon CPU E5-2650 v2 @ 2.60GHz CPU and 64 GB RAM the entire import_cv2 run takes 825.22 seconds

import_cv2.py is masking the real error. Try removing the try block here so you can see the actual problem:

This is really odd: it works now. Because of some other challenges, I had to reinstall SWIG and rebuild ctcdecoder in parallel to updating / running import_cv2. This seems to have “unblocked” something (what, I can’t tell), and it works now as expected.

Thanks for the help reuben and lissyx.

Most likely you fixed the sox package at the same time.