Hi I meet a problem when running import_cv2.py
Loading test.tsv , dev.tsv and train.tsv are ok at 100% but when happens loading validated.tsv it stops at 78% with the following error :
Progress |######################################################################################################################################### | 78% completedmultiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "bin/import_cv2.py", line 71, in one_sample
subprocess.check_output(
File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.8/subprocess.py", line 512, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['soxi', '-s', '../cv-corpus-6.1-2020-12-11/fr/clips/common_voice_fr_17892259.wav']' returned non-zero exit status 1.
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "bin/import_cv2.py", line 221, in <module>
main()
File "bin/import_cv2.py", line 216, in main
_preprocess_data(PARAMS.tsv_dir, audio_dir, PARAMS.space_after_every_character)
File "bin/import_cv2.py", line 172, in _preprocess_data
set_samples = _maybe_convert_set(dataset, tsv_dir, audio_dir, space_after_every_character)
File "bin/import_cv2.py", line 127, in _maybe_convert_set
for i, processed in enumerate(pool.imap_unordered(one_sample, samples), start=1):
File "/usr/lib/python3.8/multiprocessing/pool.py", line 868, in next
raise value
subprocess.CalledProcessError: Command '['soxi', '-s', '../cv-corpus-6.1-2020-12-11/fr/clips/common_voice_fr_17892259.wav']' returned non-zero exit status 1.
So I checked the common_voice_fr_17892259.wav it’s a 0 bytes file, which explains the bug BUT I already had this issue and I found the 5 or 6 .wav files of 0 bytes. So I went to validated.tsv and removed the lines containing these files.
Then I ran the import_cv2.py again and got the same error at 78% too (with an other wav files this time) ! I went back to my clips folder and can see that there are now more than 10 .wav files of 0 bytes !
What are these corrupted .wav files ? Does someone know how to solve this problem ?
PS : I need to specify that I don’t use the pretrained french model because I don’t know how tu use it, not yet familiar with docker