Conversational speeches as training data

tan_oscar · August 10, 2018, 5:48pm

Hi,

When adding a custom training dataset to fine tune the pretrained model, I would like know if I should keep the following conversational speeches as training data or not:

When a person repeat part of the same word twice:
Example: I want to buy a gene generator
Unfinished word or muffled word . We can guess what the word is because of the context.
The speaker laughs between words

Will all this improve or decrease the overall recognition accuracy ?

Thanks,

kdavis · August 13, 2018, 5:22pm

It really depends upon your end goal.

For example, generally we’ve removed laugh transcriptions and haven’t had any problems. But if you want to have laughs transcribed, by all means leave them in.