Impact of alignment of training data on model accuracy

does anyone have any experience (or references to papers) with using training data that are not accurately aligned and it’s impact on the quality of the trained model?

I’d like to fine-tune 0.3.0 model with some custom data that have a few seconds of loud noise before or after the actual statement but I’m concerned that the noise would be too disruptive.
