Preprocessing step taking a long time

jahir · November 7, 2018, 12:41pm

I am running DeepSpeech (0.3) on google colab. My whole dataset consists of 30000 files and training set contains ~22000 files. The preprocessing step is taking a lot time. Preprocessing the training set took 2+ hours and still did not complete.

The training file specifications are:

Sample Rate: 16000
Channel: 1
Encoding: 16 bit signed Integer PCM

Dataset folder structure is as follows:

dataset
     wav
          audio_file1.wav
          audio_file2.wav
          ...............

Previously, I trained with ~1500 audio files and Preprocessing completed in seconds. So, I can’t figure out the issue here. What might be the issue other than colab?

lissyx · November 7, 2018, 1:51pm

Processing 1500 files in how many seconds, on the same hardware ? You have a 34 factor in size of your dataset, with 52000 files to process.

jahir · November 7, 2018, 2:35pm

I think I posted a bit too prematurely. The problem was with Google Colab. Retrying some more times solved this issue.

Topic		Replies	Views
Preprocessing large dataset causes computer to freeze DeepSpeech	0	456	November 17, 2018
Decoding problem during test DeepSpeech	3	518	December 2, 2018
How do I understand approximate time for the training of dataset with 3 GB size? DeepSpeech	4	798	February 23, 2021
What will be the minimum hardware requirement for processing 50000 audio files? DeepSpeech learning , feedback	4	640	August 14, 2020
More than 2 days training DeepSpeech participation , feedback	2	894	April 12, 2020

Preprocessing step taking a long time

Related topics