The Common Voice staff at Mozilla had the great opportunity to sit down with some of the community during Mozilla’s All Hands in Whistler. During the meetings we were able to have in depth conversations about the product needs, metrics and where we see innovations to the product happening in the next 6 months. Community feedback and contribution is integral to this project and we want to hear feedback from those that were not in the room with us.
For those of you who contribute to Common Voice regularly, you may have noticed some errors in sentences. We discussed putting up a reporting tool in Common Voice and we have successfully launched it! You can see a “Report” button in the speak and validate contribution sections. This will allow people to alert us if there are grammar errors, words from a different language or other inaccuracies.
One of the hot topics that came up was the ability and need for the community to be able to better see and understand what their dataset looks like. This means, not only how many people are contributing and how many clips are recorded but also answering questions such as, what should we be focusing on in contributions? how many sentences do we have left to record? and How many sentences are skipped? These are just a few of the many data points community members would like to be aware of. We are working to roll out an MVP of community data that we can build on to help continuously answer questions about language velocity and contribution.
Common Voice is actively looking to partner with organizations that are also looking to enhance their voice data collection. While the team is still deciding what these partnerships look like, we are working to engage employees in donation as well as have companies outside of Mozilla push outreach. The goal of this would be to increase velocity in multiple languages. This is going to be one of our main focuses in the second half of the year.
Show Impact of the data
The Community would like to see what is possible with the data that we are collecting. This could be implementations in products or the ability to see a model in action using only the Common Voice dataset. This would make contribution feel more meaningful and give the community something tangible to work toward. We are currently evaluating the scope of this and understanding if this is something we can test quickly.
Build a Best Practices on Working with the Community.
The Common Voice team hears and understands that we need to be better about interacting and including each part of the community. Over the past six months we have strived to make happier and healthier community engagement and this will continue. We will work to more quickly expose our decisions for input as well as give direction on what parts of Common Voice need the most help. This topic is just an example on the kind of more regular and transparent communications we want to do.
Wikipedia Data Extraction
A blocker for many languages getting online is the lack of available CC0 sentences. With the help of the community, we are now able to pull wikipedia sentences in a way that allows them to be classified as CC0 content. This tool is still in progress and will be released to the community once it is ready.
We look forward to your continued feedback!
-The Common Voice Team