SoXI failed with exit code 1

Hello,

I tried to use the import_cv2.py for the common data set. When it starts with the last phase ( train.tsv ) always on 49% it calls this error:

ultiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/sox-1.4.0b0-py3.6.egg/sox/core.py", line 149, in soxi
stderr=subprocess.PIPE
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sox', '--i', '-c', '/home/zontax/Desktop/de/clips/common_voice_de_19411969.mp3']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "import_cv2.py", line 53, in one_sample
_maybe_convert_wav(mp3_filename, wav_filename)
File "import_cv2.py", line 163, in _maybe_convert_wav
transformer.build(mp3_filename, wav_filename)
File "/usr/local/lib/python3.6/dist-packages/sox-1.4.0b0-py3.6.egg/sox/transform.py", line 594, in build
input_filepath, input_array, sample_rate_in
File "/usr/local/lib/python3.6/dist-packages/sox-1.4.0b0-py3.6.egg/sox/transform.py", line 496, in _parse_inputs
input_format['channels'] = file_info.channels(input_filepath)
File "/usr/local/lib/python3.6/dist-packages/sox-1.4.0b0-py3.6.egg/sox/file_info.py", line 82, in channels
output = soxi(input_filepath, 'c')
File "/usr/local/lib/python3.6/dist-packages/sox-1.4.0b0-py3.6.egg/sox/core.py", line 153, in soxi
raise SoxiError("SoXI failed with exit code {}".format(cpe.returncode))
sox.core.SoxiError: SoXI failed with exit code 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "import_cv2.py", line 223, in
main()
File "import_cv2.py", line 220, in main
params.space_after_every_character)
File "import_cv2.py", line 37, in _preprocess_data
set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, filter_obj, space_after_every_character)
File "import_cv2.py", line 116, in _maybe_convert_set
for i, processed in enumerate(pool.imap_unordered(one_sample, samples_with_context), start=1):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
sox.core.SoxiError: SoXI failed with exit code 1

I’m running this on Ubuntu 18.04 , Python 3.6.9 and installed everything through the setup.py . The commands sox and soxi
work both fine (no errors so far). Tensorflow is installed through pip on git_version v1.15.0-92-g5d80e1e and version 1.15.2

I hope someone knows how to fix this issue.

Thanks in advance!

This is exactly the opposite of your log. Could you please just run the stated soxi by hand and share the complete output? There’s nothing we can do here.

1 Like

Just try the statement

Usually this means an empty or otherwise corrupted file in the data. Happens unfortunately. Just run a script beforehand that scans all files.

Just realised it’s the German dataset :slight_smile:

(1) Use audiomate’s list of erroneous files:

(2) New Common Voice dataset was released yesterday. Why not take that.

Has this list been reported to upstream ? How broken are those files ? Is it jsut sox that fails on them or are they plainly wrong ?

Not to my knowledge, I don’t know why this happens as they should be checked, but these files are definitely broken/empty/unusable. I can’t remember what was wrong with them, will check the new release once I try it.

This should be part of Corpora Creator maybe, to reject those.

1 Like