This is not accurate. Check this out
After one and a half years, I come back to my answer because my previous answer was wrong.
Batch size impacts learning significantly. What happens when you put a batch through your network is that you average the gradients. The concept is that if your batch size is big enough, this will provide a stable enough estimate of what the gradient of the full dataset would be. By taking samples from your dataset, you estimate the gradient while reducing computational cost significantly. The lower you go, the less accurate your esttimate will be, however in some cases these noisy gradients can actually help escape local minima. When it is too low, your network weights can just jump around if your data is noisy and it might be unable to learn or it converges very slowly, thus negatively impacting total computation time.
Another advantage of batching is for GPU computation, GPUs are very good at parallelizing the calculations that happen in neural networks if part of the computation is the same (for example, repeated matrix multiplication over the same weight matrix of your network). This means that a batch size of 16 will take less than twice the amount of a batch size of 8.
In the case that you do need bigger batch sizes but it will not fit on your GPU, you can feed a small batch, save the gradient estimates and feed one or more batches, and then do a weight update. This way you get a more stable gradient because you increased your virtual batch size.
WRONG, OLD ANSWER: [[[No, the batch_size on average only influences the speed of your learning, not the quality of learning. The batch_sizes also don’t need to be powers of 2, although I understand that certain packages only allow powers of 2. You should try to get your batch_size the highest you can that still fits the memory of your GPU to get the maximum speed possible.]]]]
credit:-
Theoretically, you’re only right when every sample conforms to a strict set of rules and the network is guaranteed to converge to that function, which is (sweepingly) never the case.
Correct me if I am wrong to believe this is accurate.