The Common Voice team is building something new, based on your feedback:
One of the most common pieces of feedback our team has gotten from the Common Voice community are requests to be able to host and manage your own datasets that were collected outside the Common Voice platform. To meet these requests, the Common Voice team is working to deliver Mozilla Data Collective. Mozilla Data Collective is a platform to allow you and your community to host and share your datasets on your own terms. You can learn more about Mozilla Data Collective here and if you have a dataset you want to talk to us about now, you can get in touch with the team at mozilladatacollective@mozillafoundation.org
Mozilla Data Collective will supplement, not replace Common Voice. We’ll be growing or launching the following programs and features on Common Voice, based on community feedback:
You’ve also asked for Datasheets for the Common Voice datasets:
Common Voice dataset users have asked for datasheets and we’ve listened. Datasheets are supplemental information associated with data to make it more useful for developers and researchers to work with.
Because we know that language communities are the best experts on their language(s), we’ve created a process to contribute datasets for your language. You can add a dataset for your language(s) scripted Common Voice dataset, Spontaneous Speech dataset or one for each as needed.
You can add datasheets for your language here and we’re always excited to answer your questions or take feedback.
You’ve asked for ways to customize data collection on Common Voice for your community:
We’ve also heard your feedback about wanting to build ways for your language communities to contribute to Common Voice that better fit your needs. We’ll be releasing a public API for Common Voice at the end of September that allows developers to build applications that interact with and contribute into scripted datasets. The first iteration of this API will allow developers to fetch sentences from a specified text corpus, fetch audio clips from a specified language dataset and contribute audio clips into a specified language dataset for Common Voice scripted mode. To help encourage developers to build for their language communities, we’re taking proposals for funding to support the development of applications built on the API. More information can be found here.
You’ve asked for support for your community language work:
Common Voice has been built by language communities, activists, developers and researchers. In order to better support our community members, we’re running a contributor support program. This program is by-application and accepted contributors have access to honorarium funding by application to support their data collection and language community outreach efforts. To apply for the Common Voice Contributor Support program please fill out this form.
We would love you to tell us what to do next!
Want to share your feedback, pain points or requests with the Common Voice team and community? You can join us for open office hours September (25-09-2025 5pm GMT), October (23-10-2025 7am GMT) or November: (20-11-2025 5pm GMT) or you can always reach us on Discourse, Discord, Matrix or email the team at commonvoice@mozilla.com