I have calculated the statistics on unique sentences here (Common Voice v1 corpus design problems, overlapping train/test/dev sentences). There aren’t that many in the v1 release, around 7000. What’s worse is that the same sentences are used in dev/test so that they overlap nearly 100% with train.
bmilde
(milde)
10
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Common voice sentences are the opposite of "common" | 27 | 3817 | September 7, 2024 | |
| I think its time to talk about AI generated sentences again | 11 | 1356 | March 30, 2023 | |
| How unique should a sentence be? | 7 | 1089 | May 15, 2019 | |
| I can't speak sentences in portuguese. There is no phrases for the language | 3 | 992 | August 31, 2023 | |
| I'm almost giving up on the project. Feedback from a big contributor (10000 sentences sent, 7000 listened) | 24 | 2269 | March 15, 2023 |