How accurate are the statistics of Recorded/Validated clips per language?

daniel.abzakh · July 1, 2021, 10:31am

Hello,
How accurate are the statistics of Recorded/Validated clips per language? Because people are recording and validating, but the numbers are not reflecting! Is there a delay in processing data that I should be aware of?
These numbers are important to me, in order to manage human resources for the project.
It would’ve been very useful, if the amount of remaining - unrecorded, invalidated recorded - sentences were available as well.
Here is a screenshot:

ftyers · July 1, 2021, 3:48pm

I believe there is a delay. I think the numbers are recalculated every 24h or so, but @phirework will have a better idea.

stergro · July 1, 2021, 8:45pm

I would love to have this as well. The languages page would be perfect for this. For unpublished languages there is already a bar for sentence collection. Why not keeping something similar for all languages?

daniel.abzakh · July 4, 2021, 10:18am

To have these statistics on the language page would be ideal.
Part of these statistics can be seen in the sentence collector’s page Common Voice , what’s missing is the amount of sentences that were recorded and validated, (the sentences that supposedly left the pool).
This could be calculated: Request: Number of not recorded sentences by language

mkohler · July 5, 2021, 10:08am

Note that the statistics on the SentenceCollector only includes what it knows about. Anything that’s added outside of that is not counted. So this would be missing extracts from Wikipedia through the Sentence Extractor as well as bulk uploads such as the Europarl corpus in several languages.

daniel.abzakh · July 5, 2021, 10:21am

Then it would be useful to have statistics similar to the sentence collector on the active CV Pool.

Topic		Replies	Views
Request: Number of not recorded sentences by language Common Voice feedback	10	1702	June 5, 2021
Non-English language stats Common Voice	15	1588	August 8, 2018
Recordings are never validated Common Voice issue	18	1045	January 16, 2019
Request: absolute number of clips on stats Common Voice feedback	4	906	May 28, 2020
Where can I get how many sentences I read are considered as valid recording? Common Voice	1	460	June 24, 2021

How accurate are the statistics of Recorded/Validated clips per language?

Related topics