Common Voice 10th Dataset Release

heyhillary · July 7, 2022, 10:24am

Common Voice 10 Dataset Release

On behalf of the Common Voice team at Mozilla: Thank you all for your ongoing contributions, we have reached a massive milestone in our 10th dataset release.

This wouldn’t be possible without your ideas, thoughtfulness and patience.

You can access the dataset on our downloads pages

Your Highlights

We would love to hear about your favourite moments of being part of Common Voice, please share stories in the Typeform as we would love to feature your words in a blog. Alternatively, feel free to add your thoughts in the thread

The Next 10

Incrementally, we have built technical and stewardship interventions to help language communities succeed, from introducing sentence collection bands for a more equitable start for languages on Common Voice to supporting building low-bandwidth voice data collection.

There is a lot more to do. What do the following 10 datasets look like?

We would love to get your feedback on the new dataset so feel free to share in the thread or on our GitHub.

Ps. Please note that you if you have ideas that could further diversity, equity and inclusion you can submit to the open call for the Our Voices Competition

We have also launched our voice Model and methods Competition, where you can win up to $2000 for submissions that help improve performance for gender, variant and accent and methodology.

The competition will start in September, register your interest today and read more on our blog.

Join us today for the Community Call

Together with Francesca, we have organised a Community Call which will take place tomorrow at 16:00 UTC! We will show an example of how the Common Voice community creates real-life results.

We will learn how @dexter used a Common Voice-derived model to create offline voice message transcription in Signal Desktop.

All the details are here: Community call: Offline voice message transcription in Signal Desktop - Mozilla Community Portal. If you have any questions, you can add them before or during the call here: https://bit.ly/3NFjQwu or join the #Community Room in Matrix to ask questions and interact with community members.