Hey @ftyers, I followed your suggestions and was able to reduce my loss considerably. thanks for that.
Although, after about 80 epochs, its showing abnormal behaviour (or so I think) and the validation loss isn’t decreasing below 107.
Since, Colab runtimes stop after 24 hours even in pro versions, I trained till 24 or so epochs and had to train from saved checkpoint and I removed “–drop_source_layer” since I was training from my new checkpoint.
Right now, my run looks like this:
/content/DeepSpeech
I0505 22:14:59.644571 139647931729792 utils.py:157] NumExpr defaulting to 4 threads.
I Loading best validating checkpoint from /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2206411
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 | Training | Elapsed Time: 1:52:18 | Steps: 9736 | Loss: 65.438248
Epoch 0 | Validation | Elapsed Time: 0:29:19 | Steps: 2087 | Loss: 110.117400 | Dataset: /content/dev.csv
I Saved new best validating model with loss 110.117400 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2216147
--------------------------------------------------------------------------------
Epoch 1 | Training | Elapsed Time: 0:37:36 | Steps: 9736 | Loss: 70.003445
Epoch 1 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.734356 | Dataset: /content/dev.csv
I Saved new best validating model with loss 109.734356 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2225883
--------------------------------------------------------------------------------
Epoch 2 | Training | Elapsed Time: 0:37:45 | Steps: 9736 | Loss: 69.737636
Epoch 2 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.221892 | Dataset: /content/dev.csv
I Saved new best validating model with loss 109.221892 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2235619
--------------------------------------------------------------------------------
Epoch 3 | Training | Elapsed Time: 0:37:41 | Steps: 9736 | Loss: 69.273269
Epoch 3 | Validation | Elapsed Time: 0:03:50 | Steps: 2087 | Loss: 108.556368 | Dataset: /content/dev.csv
I Saved new best validating model with loss 108.556368 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2245355
--------------------------------------------------------------------------------
Epoch 4 | Training | Elapsed Time: 0:37:42 | Steps: 9736 | Loss: 68.330056
Epoch 4 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.450104 | Dataset: /content/dev.csv
I Saved new best validating model with loss 108.450104 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2255091
--------------------------------------------------------------------------------
Epoch 5 | Training | Elapsed Time: 0:37:49 | Steps: 9736 | Loss: 68.186690
Epoch 5 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 107.932049 | Dataset: /content/dev.csv
I Saved new best validating model with loss 107.932049 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2264827
--------------------------------------------------------------------------------
Epoch 6 | Training | Elapsed Time: 0:37:38 | Steps: 9736 | Loss: 67.422159
Epoch 6 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.247110 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 7 | Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 67.124132
Epoch 7 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.619550 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 8 | Training | Elapsed Time: 0:37:44 | Steps: 9736 | Loss: 66.913915
Epoch 8 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.552628 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 9 | Training | Elapsed Time: 0:37:42 | Steps: 9736 | Loss: 66.194061
Epoch 9 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.298979 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 10 | Training | Elapsed Time: 0:37:39 | Steps: 9736 | Loss: 73.806196
Epoch 10 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.055614 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 11 | Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 73.326642
Epoch 11 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.570137 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 12 | Training | Elapsed Time: 0:37:44 | Steps: 9736 | Loss: 73.448959
Epoch 12 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.844521 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 13 | Training | Elapsed Time: 0:37:46 | Steps: 9736 | Loss: 73.072213
Epoch 13 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.568916 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 14 | Training | Elapsed Time: 0:37:39 | Steps: 9736 | Loss: 72.634806
Epoch 14 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.015372 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 15 | Training | Elapsed Time: 0:37:36 | Steps: 9736 | Loss: 72.474703
Epoch 15 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.031187 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 16 | Training | Elapsed Time: 0:37:42 | Steps: 9736 | Loss: 71.875592
Epoch 16 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.453461 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 17 | Training | Elapsed Time: 0:37:38 | Steps: 9736 | Loss: 71.264576
Epoch 17 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.374969 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 18 | Training | Elapsed Time: 0:37:44 | Steps: 9736 | Loss: 70.779961
Epoch 18 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.202204 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 19 | Training | Elapsed Time: 0:37:37 | Steps: 9736 | Loss: 70.823069
Epoch 19 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.686278 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 20 | Training | Elapsed Time: 0:37:39 | Steps: 9736 | Loss: 70.850567
Epoch 20 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.624760 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 21 | Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 70.375931
Epoch 21 | Validation | Elapsed Time: 0:03:50 | Steps: 2087 | Loss: 108.851938 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 22 | Training | Elapsed Time: 0:37:40 | Steps: 9736 | Loss: 70.420728
Epoch 22 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.534752 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 23 | Training | Elapsed Time: 0:37:37 | Steps: 9736 | Loss: 69.738112
Epoch 23 | Validation | Elapsed Time: 0:03:50 | Steps: 2087 | Loss: 108.291176 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 24 | Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 69.809616
Epoch 24 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 107.992116 | Dataset: /content/dev.csv
As you can see it seems to be stuck around this loss, what should I do?