Prompt design

I have calculated the statistics on unique sentences here (Common Voice v1 corpus design problems, overlapping train/test/dev sentences). There aren’t that many in the v1 release, around 7000. What’s worse is that the same sentences are used in dev/test so that they overlap nearly 100% with train.