I want to ask if the dev training affects the training (if it’s really bad), how many epochs does it usually take you to complete the training with your hours (loss does not decrease again). Thanks everyone
dev
is not used in training, if it is really bad it will just mean that your best model might not be the best one.
usual time until convergence depends a lot on the size of the dataset and the GPUs/batchsize etc.
for 10 hours it’s about 20 hours of training in total on a single GPU
with a batch_size of 8
1 Like