Prompt design

bmilde · January 10, 2018, 7:21pm

I have calculated the statistics on unique sentences here (Common Voice v1 corpus design problems, overlapping train/test/dev sentences). There aren’t that many in the v1 release, around 7000. What’s worse is that the same sentences are used in dev/test so that they overlap nearly 100% with train.

Topic		Replies	Views
Common voice sentences are the opposite of "common" Common Voice participation , sentence-collection , feedback , issue	27	3817	September 7, 2024
I think its time to talk about AI generated sentences again Common Voice	11	1356	March 30, 2023
How unique should a sentence be? Common Voice sentence-collection	7	1089	May 15, 2019
I can't speak sentences in portuguese. There is no phrases for the language Common Voice participation , sentence-collection , feedback , issue , dataset	3	992	August 31, 2023
I'm almost giving up on the project. Feedback from a big contributor (10000 sentences sent, 7000 listened) Common Voice	24	2269	March 15, 2023