Lately, I’ve been working on all datasets, working on statistics, health, diversity etc. Whenever I included a matrix of votes (up vs down) into the system, the data got huge. When I examined, I saw quite a few languages (larger, active datasets) have very high votes for some recordings. E.g. this is for English:
As you can see, there are votes which are more than 1000…
The +2 votes validation system is good for small values. 0-2, 1-3, 2-4… But if something is 100-102, that should point to a problem in the recording, even if it is validated finally.
I saw that while examining the Turkish recordings during writing a moderation software. In v11.0, we suddenly got a 6-8 tie-breaker (validated) recording, where the speaker actually introduced a word in a long sentence and as the result is a common phrase, you can miss it. Although I was listening to it carefully to find a mistake, I could only recognize it in my third listening.
So, if we have (in validated.tsv) 100 recordings with 100 up-votes, there should be 98 down-votes on them - and those recordings are most probably not right.
That means 100x100 + 100x98 = 19,800 (minus 2) unnecessary listen sessions, and loss of many volunteer time, and probably the result is wrong.
Best solution I can think is to stop feeding those recordings with downvotes > N (e.g. 5, 10, whatever) and flag them for professional review (e.g. by community core).
Second option is to move them to the bottom of the queue, so they will be listened when others run out (can easily be implemented on an SQL SELECT ORDER BY down+up DESC).
What do you think?