All sentences I get in french are addresses, both when speaking and validating. I looked at the french corpus on the repo and I see many other things such as politics speech, text from novels, etc…
So, I wonder how are sentences selected ? It cannot be fully random, except if the server hasn’t updated his pool of sentences for a while, am I correct ?
I think, by the way, it would be less monotonous if we got different kind of sentences to say randomly.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Strings exposed for recording are those with the least recordings, I guess we pushed too many addresses and thus the dataset is kinda imbalanced now.
I had the same feeling of doing only stupid addresses for my 2 first days on Common Voice. Then it changed, and now I get strange and funny sentences. Much better ! Does every body get the same “work profile” ?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
We might benefit from contributions to augment the size of the dataset of sentences to read, that would help mitigate this poor first experience