Training from scratch (English)

Based on the training docs I started training for english language from scratch on GPU (2060), but the training loss is increasing slowly. The training is currently running for Epoch: 0 | Step:106970 | loss:140.503940. The trainng loss is increasing it comes down to 70 during he first 10 iterations but since then It is increasing slowly.

And how many steps are there in a single epoch?

Depends on your dataset …

I am using Mozilla’s CommonVoice v2.0 language data sets for english

nr of training steps = nr of audio files / train batch size

Have you tried setting the train batch size to 8? This would speed up things enormously, if your GPU can handle it.

Model size ? batch size ? You could try to share a bit more of informations on your training parameters …

I’m using all the defaults parameter I have followed the exact documentation provided to train, TRAINING.rst.
Steps I did:

  1. Download reequirements and dataset
  2. Used import_cv2.py on entire dataset downloaded.
  3. Ran deepspeech.py script with params --train_files , --dev_files , --test_files, --use_allow_growth. Rest all are defaults.

GPU I’m using is NVIDIA GeForce RTX 2060 Super (8GB)

Seems to me like you have a very big dataset. As suggested, try increasing the batch size by including the following flags as stated in the documentation. You can start with a batch size of 64 first to see if your GPU can handle it.

If you receive an Out of Memory (OOM) error, then try reducing the batch size gradually to 32, 16, 8, 4, 2, and 1. Decreasing the batch size would mean slower training time consequently.

You can simply add in the following flags
–train_batch_size 64 --dev_batch_size 64 --test_batch_size 64