The import gets almost done, and throws an error 76% of the way through dev.tsv. dev.csv is not created. Train.csv and test.csv are. Any ideas what is happening?
@kmwerts Please avoid posting screenshots, it’s non-usable and difficult to read. Have you tried running the
soxi command by hand to see the error it produces ?
When I run
soxi -s common_voice_en_18626157.wav, it complains that the file does not have a header:
soxi FAIL formats: can't open input filedataset/clips/common_voice_en_18626157.wav’: WAVE: RIFF header not found
.` so I removed the file from the system and ran the import again, then got the same error from the 159.wav file. After that I wondered if my download or wav creation had had some kind of issue (I ran out of space on my machine a couple times), so I wiped the whole data set, reextracted, and reran import_cv2.py. Now I’m just getting the same error on a different file.
It looks like import_cv2.py is passing on any soxError (back when the wav files are being created), so maybe I need to get some info instead of passing.
Currently running a new import that should print any soxError. If that doesn’t shed any light, my only other thought is just to omit files that throw a soxError, but that doesn’t explain why this is happening to me and no one else seems to be having a problem with the common voice data.
Well, this is wild. I added:
except sox.core.SoxError as ex: print('SoX processing error', ex, orig_filename, wav_filename)
to import_cv2.py where it was creating the wav files, and I never saw a single error after that. Everything was imported properly, csvs all got made, and I’m currently training.
No idea what the issue was.