We’re really excited to be able to announce that Common Voice will be adding a new feature set allowing for contributions of spontaneous speech. This should go live in the first quarter of 2025 and will involve the creation of a second, parallel CC0 dataset!
A sample dataset from the Spontaneous Speech Alpha pilot will be released in early 2025. The core Common Voice dataset will not change and you can expect to see the same data (with more clips, validations and languages added) you’ve already been using for your development and research.
More information here, but I’m also so happy to answer any questions you might have: https://foundation.mozilla.org/en/blog/common-voice-navbar-changes-and-spontaneous-speech/