Preprocessing large dataset causes computer to freeze

I wanted to train my model on full http://openslr.org/37/ dataset. I use Google Colab for training, so I ran the preprocess script from util folder to build the feature caches.

The whole dataset contains ~200,000 files. My training set contains almost ~160,000 files (80% of total data).

When I ran the preprocess script, it was working although preprocessing were taking a long time and using high cpu/memory/IO. It took almost 3/4 hrs to process ~100,000 files (I was using less -N +F to monitor progress) but the my computer froze :frowning_face: after that. I had to force shut down it after 2/3 hrs and no cache was saved to disk :frowning_face: .

How may I run the preprocess script to build feature caches efficiently and avoid losing all progress?