When adding a custom training dataset to fine tune the pretrained model, I would like know if I should keep the following conversational speeches as training data or not:
When a person repeat part of the same word twice:
Example: I want to buy a gene generator
Unfinished word or muffled word . We can guess what the word is because of the context.
The speaker laughs between words
Will all this improve or decrease the overall recognition accuracy ?