Error when using import_cv2.py returned non-zero exit status 1

simon1 · January 9, 2021, 11:03am

good day,

here are some additional info:

Have I written custom code (as opposed to running examples on an unmodified clone of the repository): No
OS Platform and Distribution: Linux Ubuntu 18.04
TensorFlow installed from (our builds, or upstream TensorFlow): Your builds
TensorFlow version: 1.15.4
Python version: Python 3.6.9
Exact command to reproduce: python3 ./bin/import_cv2.py --audio_dir /home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/clips /home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en

I’m trying to complete the instructions from common voice data training, the part where I have to run import_cv2.py but I get an error when I try to execute.

Loading TSV file: /home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/train.tsv
Importing mp3 files…
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |################################ | 15% completedmultiprocessing.pool.RemoteTraceback:
“”"
Traceback (most recent call last):

File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker*
result = (True, func(*args, *kwds))
File “./bin/import_cv2.py”, line 72, in one_sample*
[“soxi”, “-s”, wav_filename], stderr=subprocess.STDOUT*
File “/usr/lib/python3.6/subprocess.py”, line 356, in check_output*
*kwargs).stdout
File “/usr/lib/python3.6/subprocess.py”, line 438, in run*
output=stdout, stderr=stderr)*
subprocess.CalledProcessError: Command ‘[‘soxi’, ‘-s’, ‘/home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/clips/common_voice_en_19719741.wav’]’ returned non-zero exit status 1.
“”"

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File “./bin/import_cv2.py”, line 221, in *
main()*
File “./bin/import_cv2.py”, line 216, in main*
_preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)*
File “./bin/import_cv2.py”, line 172, in _preprocess_data*
set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)*
File “./bin/import_cv2.py”, line 127, in _maybe_convert_set*
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):*
File “/usr/lib/python3.6/multiprocessing/pool.py”, line 735, in next*
raise value*
subprocess.CalledProcessError: Command ‘[‘soxi’, ‘-s’, ‘/home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/clips/common_voice_en_19719741.wav’]’ returned non-zero exit status 1.

I tried to run multiple times and it mostly points to these files:
common_voice_en_19719741.wav
common_voice_en_19719781.wav
common_voice_en_19719783.wav

When I tried to play these 2 files using vlc, I get an error
When trying to play via vlc: [00007f5f90001bc0] cache_read stream error: cannot pre fill buffer
[00007f5f90001180] mjpeg demux error: cannot peek

Other files seems to be playing audio without any errors

From this case, the user just updated ffmpeg and it worked for him. I tried it but still I have an error: https://github.com/mozilla/DeepSpeech/issues/3104

I’m using the most recent sox version, 14.4.2

Are these unplayable mp3 files corrupted? Should they be deleted?
Or maybe issues occured when I uncompressed them? (I used tar and pigz to uncompress)

othiele · January 9, 2021, 8:49am

Unfortunately, common voice sometimes has some files that are corrupt. Just search the forum or write a script of your own that checks all files and exclude those that can’t be read.

simon1 · January 9, 2021, 3:26pm

Oh alright I see. Thanks for the information.

simon1 · January 9, 2021, 7:41pm

reference for other users, here’s the command to identify files with 0 bytes and delete

find . -name 'file*' -size 0 -print0 | xargs -0 rm

src: https://stackoverflow.com/questions/3157343/how-to-delete-many-0-byte-files-in-linux