I’ve noticed a pattern where the stop loss jumps discontinuously when decreasing the number of decoder frames + batch size during training. Has anyone else seen this?
So far as I can tell, it doesn’t seem to greatly diminish inference quality (note the alignment and loss decoder scores aren’t affected much). But I was curious if it had some sort of explanation.