Hello,
I’ve been experimenting with different training batch sizes for fine tuning and consistently get higher final test loss for higher training batch size.
I always reset the starting checkpoint to the officially released one at the beginning, use the same training/validation/test data sets and only change the training batch size, for each batch size, the experiment was repeated 3x.
The avg test loss for training batch size=2 was 23.8 and for batch size=48 it was 26.08. Avg test loss was monotonically going up when trying batch size 2,4,8,16,32 and 48.
Have you seen a similar effect when choosing the default training batch size=24?