Weird error when download data from Amazon bucket and when running import_cv2.py

nthanhha26 · March 10, 2020, 1:44pm

Hi, I am running import_cv2.py file with this command:

bin/import_cv2.py --filter_alphabet alphabet_path path_to_folder_contain_clips_and_tsv_files

I have done this before with no problem, and completed training a model with small data set, but today, when I try to import bigger data set, it show up this error:

Importing mp3 files...
    Progress |#############################################################################################################################################| 100% completed
    Traceback (most recent call last):
      File "bin/import_cv2.py", line 164, in <module>
        _preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
      File "bin/import_cv2.py", line 43, in _preprocess_data
        _maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
      File "bin/import_cv2.py", line 105, in _maybe_convert_set
        with open(output_csv, 'w', encoding='utf-8') as output_csv_file:
    OSError: [Errno 28] No space left on device: '/media/hangtg/Data/deepspeech_data_process/clips/train.csv'

this is ridiculous, my SSD still got about 200GB free space (I place all my code here), and my hard drive got over 800GB free space (I place all data set here). The dataset I am trying to import is about 35GB in size. But I got some error when download it from Amazon bucket, AWS CLI also say “No space left on device” when I was running this command:

aws s3 cp s3://bucket-name path-to-data-folder --recursive

but I was still able to download the data, it just show up the line above for some files. And after that, I got this error when trying to create tsv files:

mutagen.mp3.HeaderNotFoundError: can't sync to MPEG frame

again, this error just happen to some of my files, so I use exception to skip them and continue to create tsv file.

lissyx · March 10, 2020, 2:03pm

The dataset or the tarball ?

Are you sure it’s not that some temp files are created in $TMP (defaulting to /tmp) and this is what you run out of space for?

nthanhha26 · March 10, 2020, 2:04pm

How to check the where my temp files are created? and how to change it?

lissyx · March 10, 2020, 2:05pm

Well, that’s your setup, I can’t tell you where they are created. My reply aboves already contains your answers.