Single word utterances better than sentence?

I believe single word audio recordings of data-label works better for speech recognition than datasets of multiple words clips.
Is there a supporting research/proof on my theory?
Why we do not have single word tests in Common Voice project but entire sentence?
Do you think single word clips should be added and are equally important and would improve the overall dataset of CV?

I disagree, in my view you should have about the same input material as you want to recognize. In case you want to do just that (single words), you could use such input. There is a Common Voice data set of single numbers, letters.

But I am happy to be proven wrong, just because mine works doesn’t mean there isn’t a better one :slight_smile: