The Open Innovation Voice team has started bi weekly meetings to review the work we are doing and call out blockers we have to progress. We are excited to include updates to the community as part of this series and keep everyone up to date on what we are working on.
The previous engineer working on Common Voice has moved to another team and we are so excited to bring in Jenny Zhang to the project as the lead engineer and Riley Shaw as a contractor. You will see them start to show up on GitHub as well as other voice channels as they get settled in.
We are actively working with a small group of contributors to have a community metrics dashboard that will allow active members of the community a view into many different aspects of the project metrics while still adhering to the privacy practices outlined in our terms of service.
This dashboard is being set up in Kibana and will provide information such as:
- Current data splits in real time; e.g. sex, age, accent distribution
- Contribution impact based on time frames or events; e.g. past campaigns set as milestones
- Quality of contribution:
- Overall number of validated hours
- Visualize how many validated hours are repetitions of the same sentence
- Identify voice clip rejection rate
- Identify how many clips have been reported, including filter by report attribute
- Identify clips with multiple reports (from Listen)
- Sentence health:
- Visualize the number of sentences a language has left for contribution
- Identify how many sentences have been reported, including report attribute
- Identify sentences with multiple reports (from Record)
We are improving how we capture stats directly on Common Voice app to allow communities to fully understand the impact of their individual events and campaigns in a way everyone can visualize.
Konstantina and Ruben ran an extremely successful campaign which brought in over 60,000 new contributors and a huge number of new hours.
Campaign on October 14th (email, snippets (banner on the bottom of the firefox new tab) and social for English, German, French and Spanish was a success
- German: +18 recorded +15 validated (5x)
- English: +65 recorded +48 validated (7x)
- French: +48 recorded +31 validated (6x)
- Spanish: +50 recorded +30 validated (15x)
- 11x in account creations
- Organic grown during week 2 is still higher: English 2x, Spanish 4x, French 1,5x, German 2,5x
Due to low engineering resourcing for the past 2-3 months, we are unable to get all of the work done that we would like. We are ramping up our new engineers and engaging the community to help move things forward. By combining these efforts, we will be regaining our momentum in the coming months.
Much of the work we are doing right now is to build out functionality on Partner challenges. We are currently working on a Pilot to see how different companies can work together and can increase the velocity of data collection.
This will allow us to roll out advanced features that have been tested for Common Voice in the future. We are currently working with small teams from SAP and IBM to implement a pilot for an initial challenge. This will allow us to see what teams desire from the experience and understand the support needed for future iterations leading up to a full challenge release. We will keep you up to date on how this comes together and next steps as we move forward.
Internal IT Support
You may have noticed a Common Voice outage during the campaign due in part to the huge influx of traffic two weeks ago. We have discovered that we need to be upgraded in our internal support tier and have worked with IT infrastructure to ensure we don’t have the same problem again.
We are working with contractors as well as our internal email team to implement features such as contribution reminders for custom goals and multi-language email support so we can start to email people in a larger set of languages.
We’re creating two more campaign opportunities this year as we work to grow the dataset across language and improve the rate of contribution to Common Voice overall.
Due to resourcing, we are currently reviewing when the next dataset release will be and will update the community soon on the timeline.
Emma Irwin, our Open Innovation team college did great research on how people are using data and the improvements we could make. We expect to be able to release the research to the community by the end of November.