In many ways I worry more about the source of the sentences more than sentences repeating by different speakers. I am going to train on a dataset for several epochs so sentences are going to repeat in training no matter what. At least with multiple people saying a sentence they can repeat with different voices. The overfit argument looses out to the more data argument for me. I can completely agree about segregating speakers though. I made that mistake once. Wow did those results look good, until I tested again in the wild and things were horrible. Gotta make mistakes to learn though!
1 Like
Ideally I would like to have a different version of the formatting of the data where the train, test and dev point to every version of the repeated sentence, so when you train you can cycle through different speakers when you fetch a certain sentence.