I am training a model on my own marathi dataset but i’ve noticed that the training is faster in the initial steps of the epoch and it keeps getting slower to the end. Is this because of SortaGrad? or is it unrelated? i have 4 GPUS. 2 titan XP and 2 Titan V and i’ve given the training process all of them.
±----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26 Driver Version: 430.26 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:18:00.0 Off | N/A |
| 61% 83C P2 146W / 250W | 11961MiB / 12066MiB | 100% Default |
±------------------------------±---------------------±---------------------+
| 1 TITAN V Off | 00000000:3B:00.0 Off | N/A |
| 66% 84C P2 157W / 250W | 11961MiB / 12066MiB | 100% Default |
±------------------------------±---------------------±---------------------+
| 2 TITAN Xp Off | 00000000:86:00.0 Off | N/A |
| 68% 85C P2 164W / 250W | 11843MiB / 12196MiB | 100% Default |
±------------------------------±---------------------±---------------------+
| 3 TITAN Xp Off | 00000000:AF:00.0 On | N/A |
| 61% 85C P2 184W / 250W | 11868MiB / 12193MiB | 100% Default |
±------------------------------±---------------------±---------------------+
also, my loss is stuck at 164. the dataset has 4-9 word sentences. almost 8k in training. 8 hrs. and it’s generated using a tts so it’s very clean(no abberations which would come of human error).
config :
Most are defaults from flags.py
the ones i changed are:
train_batch_size=20
dev_batch_size=2
test_batch_size=2
no_earlystop