Data distribution among sets

Hey @PKlumpp,

"Each test/train/dev set is generated non-deterministically, meaning that they will vary from release to release even for minor updates. This is to avoid reproducing and perpetuating any demographic skews in each subsequent set. "

For more details on the metadata please check out the github: GitHub - common-voice/cv-dataset: Metadata and versioning details for the Common Voice dataset

I would love to learn more about your project, if you would like to please feel free to share on this thread: Talk to us! How are you using Common Voice?

1 Like