What is the ideal batch size?

I’m wondering what is the ideal batch size theoretically or best practice?
In github’s release page, I can find train_batch_size is set to 128.

But why 128?
Is it best practice or just biggest number which fit on Quadro RTX 6000?

Since my gpu is RTX 2080Ti, batch size 128 is not feasible but should I choose biggest batch size if possible? Or should I consider step concept like yolo?

Thanks in advance.

I’m unsure we ever heavily benchmarked batch size regarding accuracy. Batch size will depend on both your hardware and dataset. With the same GPUs as you, I can push to 96 safely with french data and Automatic Mixed Precision enabled, and limit to 64 when it’s disabled.

1 Like

And just for reference, a higher batch will speed up training significantly. So set it as high as you can without getting an error for an epoch. Your results might be a tiny bit worse, but you only need half or just a quarter of the time for training.

1 Like