I’m happy to share that we are preparing everything for our next Common Voice dataset release and we want to make sure we can include as many validated hours as possible to increase its quality and usefulness.
Our goal is to release the latest data on approximately June 30th, 2020. The release of a new dataset requires some preparation and the Common Voice team is planning to initiate compilation of the latest data on June 22nd, 2020. This is considered the cut-off date for recorded and validated data to be included with the next dataset release.
Most languages have a significant number of recorded hours still waiting to be validated. We want to encourage everyone to focus your energies and communities on validating as much as possible before June 22nd. This will allow these hours to be released in the latest version of the dataset.
This will also help researchers and people training speech recognition models to have more data at their disposal to train initial models in your languages. This will also help attract more people to contribute to the project.
How can you help?
Please read and share the following community guidelines to know how to better validate voices.
Talk with your community, explain why having as many validated hours as possible by the end of June is important. Tell them about how to create a profile on the site. Set up a personal goal and review the validation guidelines (you might want to localize this topic and guidelines, then publish on your language Discourse).
Encourage fun activities to get people validating a few minutes everyday and make some noise on your community and social networks.
Thanks everyone for your contributions!