Training from scratch (English)

Nikhil_Khandelwal · June 4, 2020, 5:21pm

Based on the training docs I started training for english language from scratch on GPU (2060), but the training loss is increasing slowly. The training is currently running for Epoch: 0 | Step:106970 | loss:140.503940. The trainng loss is increasing it comes down to 70 during he first 10 iterations but since then It is increasing slowly.

And how many steps are there in a single epoch?

lissyx · June 4, 2020, 5:44pm

Depends on your dataset …

Nikhil_Khandelwal · June 4, 2020, 5:47pm

I am using Mozilla’s CommonVoice v2.0 language data sets for english

othiele · June 4, 2020, 6:15pm

nr of training steps = nr of audio files / train batch size

Have you tried setting the train batch size to 8? This would speed up things enormously, if your GPU can handle it.

lissyx · June 4, 2020, 10:30pm

Model size ? batch size ? You could try to share a bit more of informations on your training parameters …

Nikhil_Khandelwal · June 5, 2020, 5:43am

I’m using all the defaults parameter I have followed the exact documentation provided to train, TRAINING.rst.
Steps I did:

Download reequirements and dataset
Used import_cv2.py on entire dataset downloaded.
Ran deepspeech.py script with params --train_files , --dev_files , --test_files, --use_allow_growth. Rest all are defaults.

GPU I’m using is NVIDIA GeForce RTX 2060 Super (8GB)

tieonster · June 6, 2020, 2:05pm

Seems to me like you have a very big dataset. As suggested, try increasing the batch size by including the following flags as stated in the documentation. You can start with a batch size of 64 first to see if your GPU can handle it.

If you receive an Out of Memory (OOM) error, then try reducing the batch size gradually to 32, 16, 8, 4, 2, and 1. Decreasing the batch size would mean slower training time consequently.

You can simply add in the following flags
–train_batch_size 64 --dev_batch_size 64 --test_batch_size 64