good day,
here are some additional info:
- Have I written custom code (as opposed to running examples on an unmodified clone of the repository): No
- OS Platform and Distribution: Linux Ubuntu 18.04
- TensorFlow installed from (our builds, or upstream TensorFlow): Your builds
- TensorFlow version: 1.15.4
- Python version: Python 3.6.9
- Exact command to reproduce: python3 ./bin/import_cv2.py --audio_dir /home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/clips /home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en
I’m trying to complete the instructions from common voice data training, the part where I have to run import_cv2.py but I get an error when I try to execute.
Loading TSV file: /home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/train.tsv
Importing mp3 files…
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Progress |################################ | 15% completedmultiprocessing.pool.RemoteTraceback:
“”"
Traceback (most recent call last):
- File “/usr/lib/python3.6/multiprocessing/pool.py”, line 119, in worker*
- result = (True, func(*args, *kwds))
- File “./bin/import_cv2.py”, line 72, in one_sample*
- [“soxi”, “-s”, wav_filename], stderr=subprocess.STDOUT*
- File “/usr/lib/python3.6/subprocess.py”, line 356, in check_output*
- *kwargs).stdout
- File “/usr/lib/python3.6/subprocess.py”, line 438, in run*
- output=stdout, stderr=stderr)*
subprocess.CalledProcessError: Command ‘[‘soxi’, ‘-s’, ‘/home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/clips/common_voice_en_19719741.wav’]’ returned non-zero exit status 1.
“”"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
- File “./bin/import_cv2.py”, line 221, in *
- main()*
- File “./bin/import_cv2.py”, line 216, in main*
- _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)*
- File “./bin/import_cv2.py”, line 172, in _preprocess_data*
- set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)*
- File “./bin/import_cv2.py”, line 127, in _maybe_convert_set*
- for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):*
- File “/usr/lib/python3.6/multiprocessing/pool.py”, line 735, in next*
- raise value*
subprocess.CalledProcessError: Command ‘[‘soxi’, ‘-s’, ‘/home/user/ds/mozilla-common-voice/cv-corpus-6.1-2020-12-11/en/clips/common_voice_en_19719741.wav’]’ returned non-zero exit status 1.
I tried to run multiple times and it mostly points to these files:
common_voice_en_19719741.wav
common_voice_en_19719781.wav
common_voice_en_19719783.wav
When I tried to play these 2 files using vlc, I get an error
When trying to play via vlc: [00007f5f90001bc0] cache_read stream error: cannot pre fill buffer
[00007f5f90001180] mjpeg demux error: cannot peek
Other files seems to be playing audio without any errors
From this case, the user just updated ffmpeg and it worked for him. I tried it but still I have an error: https://github.com/mozilla/DeepSpeech/issues/3104
I’m using the most recent sox version, 14.4.2
Are these unplayable mp3 files corrupted? Should they be deleted?
Or maybe issues occured when I uncompressed them? (I used tar and pigz to uncompress)