Common Voice 10 Dataset Release
On behalf of the Common Voice team at Mozilla: Thank you all for your ongoing contributions, we have reached a massive milestone in our 10th dataset release.
This wouldn’t be possible without your ideas, thoughtfulness and patience.
You can access the dataset on our downloads pages
We would love to hear about your favourite moments of being part of Common Voice, please share stories in the Typeform as we would love to feature your words in a blog. Alternatively, feel free to add your thoughts in the thread
The Next 10
Incrementally, we have built technical and stewardship interventions to help language communities succeed, from introducing sentence collection bands for a more equitable start for languages on Common Voice to supporting building low-bandwidth voice data collection.
There is a lot more to do. What do the following 10 datasets look like?
We would love to get your feedback on the new dataset so feel free to share in the thread or on our GitHub.
Ps. Please note that you if you have ideas that could further diversity, equity and inclusion you can submit to the open call for the Our Voices Competition
We have also launched our voice Model and methods Competition, where you can win up to $2000 for submissions that help improve performance for gender, variant and accent and methodology.
Join us today for the Community Call
Together with Francesca, we have organised a Community Call which will take place tomorrow at 16:00 UTC! We will show an example of how the Common Voice community creates real-life results.
We will learn how @dexter used a Common Voice-derived model to create offline voice message transcription in Signal Desktop.
All the details are here: https://mzl.la/3bnkwsM. If you have any questions, you can add them before or during the call here: https://bit.ly/3NFjQwu or join the #Community Room in Matrix to ask questions and interact with community members.