Hi, I am running import_cv2.py file with this command:
bin/import_cv2.py --filter_alphabet alphabet_path path_to_folder_contain_clips_and_tsv_files
I have done this before with no problem, and completed training a model with small data set, but today, when I try to import bigger data set, it show up this error:
Importing mp3 files...
Progress |#############################################################################################################################################| 100% completed
Traceback (most recent call last):
File "bin/import_cv2.py", line 164, in <module>
_preprocess_data(PARAMS.tsv_dir, AUDIO_DIR, label_filter_fun, PARAMS.space_after_every_character)
File "bin/import_cv2.py", line 43, in _preprocess_data
_maybe_convert_set(input_tsv, audio_dir, label_filter, space_after_every_character)
File "bin/import_cv2.py", line 105, in _maybe_convert_set
with open(output_csv, 'w', encoding='utf-8') as output_csv_file:
OSError: [Errno 28] No space left on device: '/media/hangtg/Data/deepspeech_data_process/clips/train.csv'
this is ridiculous, my SSD still got about 200GB free space (I place all my code here), and my hard drive got over 800GB free space (I place all data set here). The dataset I am trying to import is about 35GB in size. But I got some error when download it from Amazon bucket, AWS CLI also say “No space left on device” when I was running this command:
aws s3 cp s3://bucket-name path-to-data-folder --recursive
but I was still able to download the data, it just show up the line above for some files. And after that, I got this error when trying to create tsv files:
mutagen.mp3.HeaderNotFoundError: can't sync to MPEG frame
again, this error just happen to some of my files, so I use exception to skip them and continue to create tsv file.