Sentences analysis on main languages - Action needed for the ones with deficit

nukeador · July 29, 2019, 4:56pm

We haven’t put a hard-lock on the site to recordings because some communities asked to be able to keep collecting voices for other uses (not Deep Speech) and also it’s been just recent (late last year) since we got more clarity on the number of recordings per sentence needed to train models.

That’s why we have been always pushing for communities to get more sentences (sentence collector and wikipedia extraction work). Recordings for the same sentence are not lost (they are part of the dataset), it’s just that are not as useful as recording a new sentence.

We are going to be working with the Deep Speech to fully understand this and propose the necessary changes that help their goals as well as balance that with what communities are asking for.

What do you mean by “non-used” here? For these languages with deficit, all sentences have been already recorded at least once.

Cheers.