Based on the training docs I started training for english language from scratch on GPU (2060), but the training loss is increasing slowly. The training is currently running for Epoch: 0 | Step:106970 | loss:140.503940. The trainng loss is increasing it comes down to 70 during he first 10 iterations but since then It is increasing slowly.
And how many steps are there in a single epoch?