Import_cv2.py : return non-zero exit status 1 at each mp3 file of test.tsv

LucieDevGirl · February 3, 2021, 9:59am

I’m trying to run import_cv2.py since many days and can’t resolve this error.

subprocess.CalledProcessError: Command '['sox', '--i', '-c', 'CORPUS\\cv-corpus-6.1-2020-12-11\\fr\\clips\\common_voice_fr_19738183.mp3']' returned non-zero exit status 1.

I indeed found some articles on this forum talking about this error but the answers found was not useful for me. Error when using import_cv2.py returned non-zero exit status 1
I read I had to remove the mp3 file because it is certainly corrupted. I think I’m not concerned because when I removed it, the error was generated for the next mp3 file in the test.tsv and the next one and the next one …

I want to specify I have sox 1.4.1 installed with pip
Curiously I could’nt run sox on my terminal whit only this package installed, then I followed some advices on forums and have installed sox from this site https://sourceforge.net/projects/sox/files/sox/ I downloaded the last version and installed it correctly. Now sox works in my terminal. Curiously it is the version 14.4.2 and there is no version like my package 1.4.1 or inferior to 12.

othiele · February 3, 2021, 9:58am

You are right, you need to install sox with sth like sudo apt install sox and pip install sox. I have both with v14.4.2 and 1.4.1 running just fine.

So is your system sox really 1.14.2 or 14.4.2? And is the import script running now?

LucieDevGirl · February 3, 2021, 10:03am

No sorry, was a mistake, It is the 14.4.2 version. I edited my question
For now the script generating the error is \DeepSpeech\envtrain\lib\site-packages\sox\core.py , envtrain being my python venv so I don’t understand what to do because I already set my environment variables and my terminal usually recognizes sox

Need to specify I’m working on windows not on linux

othiele · February 3, 2021, 10:12am

Don’t train on Windows, this will lead to many more errors. Inferencing is fine though.
You didn’t post the error msg, so it is hard to tell what is wrong. It is ok, that sox is installed in your virtual environment.

Why not switch to Google Colab to get going. Most stuff works out of the box there like sox.

LucieDevGirl · February 3, 2021, 10:17am

Thanks for your answer, I will try on google colab

oh I didn’t want paste all the error because it’s big. I thought the key was in the line I pasted but here is the entire error message

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\core.py", line 149, in soxi
    stderr=subprocess.PIPE
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['sox', '--i', '-c', 'CORPUS\\cv-corpus-6.1-2020-12-11\\fr\\clips\\common_voice_fr_19738183.mp3']' returned non-zero exit status 1.   

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\PROJETS\DeepSpeech\bin\import_cv2.py", line 65, in one_sample
    _maybe_convert_wav(mp3_filename, wav_filename)
  File "C:\PROJETS\DeepSpeech\bin\import_cv2.py", line 185, in _maybe_convert_wav
    transformer.build(mp3_filename, wav_filename)
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\transform.py", line 594, in build
    input_filepath, input_array, sample_rate_in
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\transform.py", line 496, in _parse_inputs
    input_format['channels'] = file_info.channels(input_filepath)
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\file_info.py", line 82, in channels
    output = soxi(input_filepath, 'c')
  File "C:\PROJETS\DeepSpeech\envtrain\lib\site-packages\sox\core.py", line 153, in soxi
    raise SoxiError("SoXI failed with exit code {}".format(cpe.returncode))
sox.core.SoxiError: SoXI failed with exit code 1

and it continues with the same …

 WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
    The above exception was the direct cause of the following exception:

WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
Traceback (most recent call last):
  File "bin\import_cv2.py", line 221, in <module>
    main()
  File "bin\import_cv2.py", line 216, in main
    _preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
  File "bin\import_cv2.py", line 172, in _preprocess_data
    set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
  File "bin\import_cv2.py", line 127, in _maybe_convert_set
    for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
  File "C:\Users\lucie\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py", line 748, in next
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.
    raise value
sox.core.SoxiError: SoXI failed with exit code 1
WARNING: No --validate_label_locale specified, your might end with inconsistent dataset.

othiele · February 3, 2021, 10:17am

Looks like you don’t provide a fr flag. Check with --help.
Is sox now on your Path? Looks like Python can’t find it. Maybe close console and open again.

lissyx · February 3, 2021, 11:50am

If you are working on french, have you considered the already trained models as well as the dockerfile to train one?

github.com

Common-Voice/commonvoice-fr/blob/master/DeepSpeech/Dockerfile.train

FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04

ARG ds_repo=mozilla/DeepSpeech
ARG ds_branch=4270e22fe02f4fa7430a721ac917f6353c36f455
ARG ds_sha1=4270e22fe02f4fa7430a721ac917f6353c36f455
ARG cc_repo=mozilla/CorporaCreator
ARG cc_sha1=73622cf8399f8e634aee2f0e76dacc879226e3ac
ARG kenlm_repo=kpu/kenlm
ARG kenlm_branch=87e85e66c99ceff1fab2500a7c60c01da7315eec

# Model parameters
ARG model_language=fr
ENV MODEL_LANGUAGE=$model_language

# Training hyper-parameters
ARG batch_size=64
ENV BATCH_SIZE=$batch_size

ARG n_hidden=2048
ENV N_HIDDEN=$n_hidden

This file has been truncated. show original

LucieDevGirl · February 5, 2021, 4:40pm

1- Sorry I don’t understand what you mean by flag , I ran import_cv2.py --help and read every argument , no one talks about some “flag” for a specific language.

2 - sox is indeed defined in my Path.

By the way I tried with Google Collab and get the same error

lissyx · February 5, 2021, 4:48pm

@LucieDevGirl Again, why don’t you just re-use existing french model and its dockerfile to train? It includes Common Voice dataset …

LucieDevGirl · February 5, 2021, 5:26pm

yes will try this solution now thank you

othiele · February 7, 2021, 10:47am

Yep, I thought there was a language specific parameter, but that could have been audiomate.

Then share a link to it here, so we can check.

lissyx · February 8, 2021, 8:50am

Maybe --validate_label ?

LucieDevGirl · February 9, 2021, 5:11pm

Yes Here is my colab

https://drive.google.com/file/d/150dZpMZ87-0cboAx06htbbHiLqUILOUM/view?usp=sharing

After doing this I need to learn how to handle docker, and maybe switch to linux

lissyx · February 9, 2021, 5:15pm

We have released models that you can use directly …

othiele · February 9, 2021, 7:22pm

Ideally you set up the colab so we can run it too. You seem to upload files yourself, which is not reproducable. And please follow up on the error msgs. The current one hints to a wrong path or missing file:

OSError: input_filepath CORPUS/cv-corpus-6.1-2020-12-11/fr/clips/common_voice_fr_19738183.mp3 does not exist.