Two speakers in a single audio

Hello everyone,
I am trying to train a model and have audio files with two speakers repeating the same sentence. Splitting the audio files is a lot of manual work and i am wondering whether it is possible to train the model where two speakers are in the same audio ? Understandably that the transcript will repeat the text twice.
Does that help or reduce the progress of the training ?