Conversational speeches as training data

Hi,

When adding a custom training dataset to fine tune the pretrained model, I would like know if I should keep the following conversational speeches as training data or not:

  1. When a person repeat part of the same word twice:
    Example: I want to buy a gene generator

  2. Unfinished word or muffled word . We can guess what the word is because of the context.

  3. The speaker laughs between words

Will all this improve or decrease the overall recognition accuracy ?

Thanks,

It really depends upon your end goal.

For example, generally we’ve removed laugh transcriptions and haven’t had any problems. But if you want to have laughs transcribed, by all means leave them in.