If I remember correctly that was for each GPU, but search the forum, this question came up before.
For your own experiments start with a smaller number like 8 and go up until you get an error in the first epoch. 32, 64 are good values for a sth like a V100.