Hi Daniele,
Thanks for the info.
It’s totally clear that first of all, for language with scarce data, like Italian, we need more recordings from people to improve language models and slow down the error rate.Yes, we need a lot of dissemination to let people use common voice portal, as much as possible.
My side (I’m new here), I’ll publish in November and article about CV on my blog: https://www.convcomp.it and I am willing to talk, in Italian conferences/meetups/etc., about the topic (Mozilla CV, Mozilla DeepSpeech, Mozilla TTS).
Back to bad words management:
What do you mean with "scraper support?
Anyway, I imagine two phases:
-
the collection of sentences containing “bad words”
these sentences would be flagged and keep in a set distinct from the “whitout badwords” set (you say “two lists”).
-
the CV web portal user experience
My porposal here is a change request on the UX:
By default user are not allowed to read sentences with badwords, ok!
But, the user (maybe a registered profiled user) is able to opt-in, avoiding the badword filter (saying/clicking somewhere in profile: “I Take all / I accept badwords”)
BTW, for under 19, I imagine a similar option-in flow (I have to think more about it)
giorgio