Hello,
How accurate are the statistics of Recorded/Validated clips per language? Because people are recording and validating, but the numbers are not reflecting! Is there a delay in processing data that I should be aware of?
These numbers are important to me, in order to manage human resources for the project.
It would’ve been very useful, if the amount of remaining - unrecorded, invalidated recorded - sentences were available as well.
Here is a screenshot:
I believe there is a delay. I think the numbers are recalculated every 24h or so, but @phirework will have a better idea.
I would love to have this as well. The languages page would be perfect for this. For unpublished languages there is already a bar for sentence collection. Why not keeping something similar for all languages?
To have these statistics on the language page would be ideal.
Part of these statistics can be seen in the sentence collector’s page https://commonvoice.mozilla.org/sentence-collector/ , what’s missing is the amount of sentences that were recorded and validated, (the sentences that supposedly left the pool).
This could be calculated: Request: Number of not recorded sentences by language
Note that the statistics on the SentenceCollector only includes what it knows about. Anything that’s added outside of that is not counted. So this would be missing extracts from Wikipedia through the Sentence Extractor as well as bulk uploads such as the Europarl corpus in several languages.
Then it would be useful to have statistics similar to the sentence collector on the active CV Pool.