As you might already know, Mozilla’s Common Voice project aims to unlock and open the voice ecosystem to all players by releasing a public domain dataset of voices in different languages that can be used freely without compromising users privacy sending data to the big tech giants.
In the coming months we want to advance English voice collection and get to the next level. In order for a language to be usable by speech recognition engines we need at least 2000 hours of voice collected from at least 1000 diverse people, and we think communities are the only ones that can enable this diversity.
That’s why we need Mozillians all over the world to contribute and get others to contribute. Our goal is to gather at least 100 additional hours validated in English by the end of June. We are focusing on English-only this time because it’s the language with more available sentence to read right now, we’ll focus on other languages in the near future.
nukeador
(Rubén Martín [❌ taking a break from Mozilla])
3
Welcome @kehols. Please check the sprint page for details, you can help from doing individual contributions, to small activity with your friends/family, to an event in your local University, it’s up to you!
How diverse do the people need to be? Specifically, do you also need voice samples from non-native English speakers?
nukeador
(Rubén Martín [❌ taking a break from Mozilla])
5
As diverse as possible (age, gender, pitch…). At this point we would like people who has a specific accent from and English-speaking country, we are in conversation to define a more solid accents strategy but right now the site only offers English-native accents.
To be clear, at this point you’re only interested in diverse people speaking english as a native language, is that right?
The message was a bit unclear, talking about countries where english is used as an everyday language (so with this definition a French person leaving in England could match), but I understand from your messages here that it wasn’t the right meaning.
I’m mostly trying to transmit the announce in the most exact way as possible