Import_cv2.py : return non-zero exit status 1 at each mp3 file of test.tsv

I’m trying to run import_cv2.py since many days and can’t resolve this error.

subprocess.CalledProcessError: Command '['sox', '--i', '-c', 'CORPUS\\cv-corpus-6.1-2020-12-11\\fr\\clips\\common_voice_fr_19738183.mp3']' returned non-zero exit status 1.

I indeed found some articles on this forum talking about this error but the answers found was not useful for me. Error when using import_cv2.py returned non-zero exit status 1
I read I had to remove the mp3 file because it is certainly corrupted. I think I’m not concerned because when I removed it, the error was generated for the next mp3 file in the test.tsv and the next one and the next one …

I want to specify I have sox 1.4.1 installed with pip
Curiously I could’nt run sox on my terminal whit only this package installed, then I followed some advices on forums and have installed sox from this site https://sourceforge.net/projects/sox/files/sox/ I downloaded the last version and installed it correctly. Now sox works in my terminal. Curiously it is the version 14.4.2 and there is no version like my package 1.4.1 or inferior to 12.

You are right, you need to install sox with sth like sudo apt install sox and pip install sox. I have both with v14.4.2 and 1.4.1 running just fine.

So is your system sox really 1.14.2 or 14.4.2? And is the import script running now?

No sorry, was a mistake, It is the 14.4.2 version. I edited my question
For now the script generating the error is \DeepSpeech\envtrain\lib\site-packages\sox\core.py , envtrain being my python venv so I don’t understand what to do because I already set my environment variables and my terminal usually recognizes sox

Need to specify I’m working on windows not on linux

  1. Don’t train on Windows, this will lead to many more errors. Inferencing is fine though.

  2. You didn’t post the error msg, so it is hard to tell what is wrong. It is ok, that sox is installed in your virtual environment.

Why not switch to Google Colab to get going. Most stuff works out of the box there like sox.

Thanks for your answer, I will try on google colab

oh I didn’t want paste all the error because it’s big. I thought the key was in the line I pasted but here is the entire error message

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\core.py", line 149, in soxi
    stderr=subprocess.PIPE
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sox', '--i', '-c', 'CORPUS\\cv-corpus-6.1-2020-12-11\\fr\\clips\\common_voice_fr_19738183.mp3']' returned non-zero exit status 1.   

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\PROJETS\DeepSpeech\bin\import_cv2.py", line 65, in one_sample
    _maybe_convert_wav(mp3_filename, wav_filename)
  File "C:\PROJETS\DeepSpeech\bin\import_cv2.py", line 185, in _maybe_convert_wav
    transformer.build(mp3_filename, wav_filename)
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\transform.py", line 594, in build
    input_filepath, input_array, sample_rate_in
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\transform.py", line 496, in _parse_inputs
    input_format['channels'] = file_info.channels(input_filepath)
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\file_info.py", line 82, in channels
    output = soxi(input_filepath, 'c')
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\core.py", line 153, in soxi
    raise SoxiError("SoXI failed with exit code {}".format(cpe.returncode))
sox.core.SoxiError: SoXI failed with exit code 1

and it continues with the same …

 WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
    The above exception was the direct cause of the following exception:

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Traceback (most recent call last):
  File "bin\import_cv2.py", line 221, in <module>
    main()
  File "bin\import_cv2.py", line 216, in main
    _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
  File "bin\import_cv2.py", line 172, in _preprocess_data
    set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
  File "bin\import_cv2.py", line 127, in _maybe_convert_set
    for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 748, in next
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
    raise value
sox.core.SoxiError: SoXI failed with exit code 1
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
  1. Looks like you don’t provide a fr flag. Check with --help.

  2. Is sox now on your Path? Looks like Python can’t find it. Maybe close console and open again.

If you are working on french, have you considered the already trained models as well as the dockerfile to train one?


1- Sorry I don’t understand what you mean by flag , I ran import_cv2.py --help and read every argument , no one talks about some “flag” for a specific language.

2 - sox is indeed defined in my Path.

By the way I tried with Google Collab and get the same error

@LucieDevGirl Again, why don’t you just re-use existing french model and its dockerfile to train? It includes Common Voice dataset …

yes will try this solution now thank you :slight_smile:

Yep, I thought there was a language specific parameter, but that could have been audiomate.

Then share a link to it here, so we can check.

Maybe --validate_label ?

Yes Here is my colab

https://drive.google.com/file/d/150dZpMZ87-0cboAx06htbbHiLqUILOUM/view?usp=sharing

After doing this I need to learn how to handle docker, and maybe switch to linux

We have released models that you can use directly …

Ideally you set up the colab so we can run it too. You seem to upload files yourself, which is not reproducable. And please follow up on the error msgs. The current one hints to a wrong path or missing file:

OSError: input_filepath CORPUS/cv-corpus-6.1-2020-12-11/fr/clips/common_voice_fr_19738183.mp3 does not exist.