I just found Common Voice, and I think it’s an amazing thing!
The power you give to speech researchers is wonderful
My problem is that I just want raw audio data (I don’t care about validated transcriptions) of as many languages as possible for my research. Is it possible to download your audio data for all languages, not just the ones that are done being validated?
Yup, we make all audio available when we publish. Validated, invalidated, and yet to be validated. You can find those in the current published dataset for english.
That’s nice! I thought only validated audio was downloadable.
What I really want is all audio (all languages, all speakers), and I don’t need text at all, just the audio. Is it possible to download all samples, not just English?