Iām not pretty sure about the first question, for the second question, the number of steps is equal to the amount of data divide by batch size so that you just need to increase the batch size (but the batch size is an important value for getting the gradient, may cause some problem if the batch size too big).