Query regarding speed of training and issues with convergence

That’s expected:

Generally I see that the Titan V is faster on this kind of task, you should try only training first with the V’s, the weight sync can be a bottleneck limiting the true power of the V’s, I mean the V’s waiting for the XP’s to complete the epoch to sync is not really a good thing.

Try using the nvidia optimized container for auto mixed precision training (only with both Titan V)?