My name is Fredrik and I’ve used Common Voice for my master thesis “Language agnostic voice classification for conversational applications” where we classify the age and gender of the speakers in the Common Voice 7.0 corpus. In doing so we filtered out the voice clips without metadata, set a 15 s clip limit and kept a maximum of five clips per speaker. This reduced the corpus to 74 different languages, 43,255 unique speakers, 318 hours and 221,211 clips of recorded voice data. It’s a version of Common Voice that can be more easily used for speech processing tasks other than just ASR.
I’m happy to share my research, extensive data exploration and, of course, the data if anyone is interested in using it. It’s nothing crazy, but might save researchers some valuable time.
Hope you are having a good day!
Kind regards,
Fredrik Lastow