The Common Voice team is so excited to be releasing the 17.0 Common Voice dataset, made possible by our voice and text corpus contributors, language community activists, open source contributors and countless other community members. Thank you all so much for making this possible.
The Common Voice speech corpus is now a dazzling 31,000 hours of speech clips. This is an increase of 847 hours since our last release. This release also adds 493 hours of validated clips to the new dataset!
Clips in Haitian Creole, Nso, Zulu and Zaza join the Common Voice dataset for the first time with this release.
This dataset is inclusive of data collected through March 14th, 2024. Data collected after March 14th will be included in the next dataset release.
Dataset releases are quarterly, and we expect to see 18.0 released in June 2024.