Are you cleaning up the checkpoint directory ? Training with bigger batch size, if not running OOM, should be faster, so it’s possible that the test step happens.
Looking at your log, the epoch is also higher in your second log, so it could be consistent?