Based on the training docs I started training for english language from scratch on GPU (2060), but the training loss is increasing slowly. The training is currently running for Epoch: 0 | Step:106970 | loss:140.503940. The trainng loss is increasing it comes down to 70 during he first 10 iterations but since then It is increasing slowly.
And how many steps are there in a single epoch?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Seems to me like you have a very big dataset. As suggested, try increasing the batch size by including the following flags as stated in the documentation. You can start with a batch size of 64 first to see if your GPU can handle it.
If you receive an Out of Memory (OOM) error, then try reducing the batch size gradually to 32, 16, 8, 4, 2, and 1. Decreasing the batch size would mean slower training time consequently.
You can simply add in the following flags
–train_batch_size 64 --dev_batch_size 64 --test_batch_size 64