Preprocessing large dataset causes computer to freeze

jahir · November 17, 2018, 4:10pm

I wanted to train my model on full http://openslr.org/37/ dataset. I use Google Colab for training, so I ran the preprocess script from util folder to build the feature caches.

The whole dataset contains ~200,000 files. My training set contains almost ~160,000 files (80% of total data).

When I ran the preprocess script, it was working although preprocessing were taking a long time and using high cpu/memory/IO. It took almost 3/4 hrs to process ~100,000 files (I was using less -N +F to monitor progress) but the my computer froze after that. I had to force shut down it after 2/3 hrs and no cache was saved to disk .

How may I run the preprocess script to build feature caches efficiently and avoid losing all progress?

Topic		Replies	Views
Preprocessing step taking a long time DeepSpeech	2	659	November 7, 2018
Training TEDLIUM3 on AWS GPU(TESLA ) instance with 480GB RAM and 64 CPU DeepSpeech	2	593	January 3, 2019
Error getting feature_cache DeepSpeech	6	1588	April 20, 2020
How much disk space is required for training Deepspeech model DeepSpeech	16	2596	February 22, 2018
What are the options for someone without a proper GPU? Cloud services, VMs or external GPUs? DeepSpeech	45	1659	July 7, 2020

Preprocessing large dataset causes computer to freeze

Related topics