Exciting news!
Over the last eight years, the Common Voice community has shared wishlists with us for ways to create, curate, and control their data that extend beyond our current platform capabilities. For example, supporting the collection and release of datasets under different licences to CC-0, and the ability to contribute datasets collected externally to Common Voice.
We’re so excited to announce that we have grown the team that developed Common Voice, and are expanding to build Mozilla Data Collective, a sister platform to enable you and your communities to share datasets in new ways. We’ll be building and releasing this platform throughout this year to help dataset owners and data creators to share their data with developers, researchers and others on their own terms.
We wanted to share our plans with our Common Voice community early on to make space for you to participate in shaping Mozilla Data Collective. We want the Collective to meet your needs, and support Common Voice thriving alongside it.
What does this mean for Common Voice?
More support and optionality for Common Voice community members! Mozilla Data Collective is designed to help bring the community new options for control over more types of their own data. The Common Voice platform will continue to have full-time engineering support focussed on improving what exists, and will remain accessible under the same MPL 2.0 licence. Common Voice will continue to be supported by Mozilla Foundation team members, our contributors and the wider community.
The existing Common Voice datasets will continue to be accessible under the CC-0 licence they were released with. Historical datasets will continue to be available through the Common Voice website. Future versions of the datasets will be released through Mozilla Data Collective.
What are the benefits for my language community?
The experience will be significantly improved. We will be:
-
integrating robust, detailed datasheets about what is contained in the datasets
-
adding programmatic access through a developer API
By keeping the services somewhat separate, we will be optimising for a high speed, and scalable performance, which are all issues on which we have received helpful feedback- and we thank the Common Voice community for your valuable inputs.
As part of the work with Mozilla Data Collective, Common Voice will also be offering more granularity around licensing and access options to our data communities. For instance, the Creative Commons licences CC-BY, CC-BY-SA or customised versions like the Nwulite Obodo (NOODL) licence.
Next steps:
Common Voice dataset users, contributors or community members don’t have to do anything right now. We’ll share more information about Mozilla Data Collective to this channel soon.
Questions and participation:
We would love to hear your thoughts. You can also chat to us about datasets you’d like to explore making available through MDC - the MCV community will of course be getting early access! You can email the team at commonvoice@mozilla.com any time, in the language you’re most comfortable using.
In the next open office hours session on the 28th August, 2025 if you would like to discuss Mozilla Data Collective or any other Common Voice topics with the team and community. See you there!