Recordings are never validated

issue

(Mike Sheldon) #1

Since the introduction of the new dashboard (which is great by the way) I’ve become aware that less than 20% of the recordings I’ve submitted over the past year have actually been validated (296 out of 1594). The ones that have been validated have a nearly 100% acceptance rate, it’s not that they’re being rejected but more that they never seem to get checked at all.

Looking at other users on the leaderboard shows a couple of people in the same boat as me, but almost all other users have the majority of their recordings verified.

Any ideas what might be the cause behind this?


#2

What language are you recording in?


(Mike Sheldon) #3

I’m recording in English


(Pedro Lima) #4

How you know that your recordings were never checked? I saw a lot of people with 100% acceptance rate, people with around 150 to 300 recordings. Also people with more clips verified must have old recordings, I suppose the clips are on a queue.


(Mike Sheldon) #5

You can see in the dashboard how many clips have been recorded vs how many have actually been verified:

This shows 1594 total recordings, but only 296 verified ones, with an acceptance rate of 99.66%, so most of the remaining 1298 haven’t been rejected, but neither have they been verified.

For comparison there are users with around 5000 recordings with the majority verified, so it’s not just that there’s a large number of them:

And my recordings are spread out over the space of more than a year, so there’s not suddenly a lot of new ones that need checking all at once.


(Pedro Lima) #6

I see, still a lot of clips to be verified almost 40%.


(Lissyx) #7

General stats for english shows 400 hours not yet validated, but it’s increasing steadily, so is it possible we are just out of enough contributor to fill the gap of those validations ?


(Gregor) #8

Heya,

thanks for reporting this! I just did a manual lookup in the DB and it says that only 300 of your clips still need votes. So there must be something wrong with the query I wrote for displaying the amount of valid clips. I’ve opened an issue for it: https://github.com/mozilla/voice-web/issues/1741


(Mike Sheldon) #9

Ah, that’s good to know; thanks very much for looking into it :slight_smile:


(Gregor) #10

Oh no, my last msg was off. I accidently queried your votes instead of clips (which didn’t have the disparity). So the DB is confirming that your clips are massively undervalidated. I’ll investigate further! :male_detective:

As suggested above, we do have 200k not-yet-validated clips in english and there’s a certain amount of randomness in which clips are validated. I’m no statistican, but this might be well inside of the bounds of (mis-)fortune. I know that’s not a satisfying answer, I’ll keep pondering on how we can do better here.


(Mike Sheldon) #11

No worries, thanks for investigating :slight_smile:


#12

Is it that the validation queue serves up clips in order of most recent? That would explain why some clips don’t get validated - reviewers never reach them because they keep getting offset by new clips.


(Gregor) #13

It’s actually the other way around, oldest clips that still need votes are served first. So the more time passes the more likely it should be that your clip is picked.

I just ran a query to find out how long it took on avg per language for clips to be validated (only for clips from the last 6 months):

+--------+----------+
| locale | avg      |
+--------+----------+
| en     |  12.4523 |
| de     |   0.7007 |
| fr     |   1.1748 |
| cy     |   5.0992 |
| br     |  10.3343 |
| cv     | 127.4138 |
| tr     |  49.4842 |
| tt     |   8.1130 |
| kab    |   1.1101 |
| ky     |  27.1614 |
| ga-IE  |   8.5747 |
| sl     |  80.3419 |
| ca     |  23.1162 |
| it     |  33.2519 |
| zh-TW  |  52.0855 |
| eo     |   8.6680 |
| nl     |  13.9507 |
| cnh    |   2.9096 |
| et     |   0.8576 |
+--------+----------+

This avg doesn’t include the not-yet-validated clips, as I don’t know how to average infinity.

update: Okay I guess this is how I capture it, here’s the average time a clip lies around without being validated (which this time also includes not-yet-validated clips):

+--------+---------+
| locale | avg     |
+--------+---------+
| en     | 42.3608 |
| de     |  0.7005 |
| fr     |  1.1681 |
| cy     |  4.8960 |
| br     | 24.1628 |
| cv     | 64.8546 |
| tr     | 58.6405 |
| tt     |  9.7254 |
| kab    |  1.1066 |
| ky     | 31.8533 |
| ga-IE  |  8.5368 |
| sl     | 51.6949 |
| ca     | 25.5847 |
| it     | 31.2514 |
| zh-TW  | 91.5607 |
| eo     | 22.1319 |
| nl     | 19.7100 |
| cnh    | 20.5178 |
| et     |  0.7486 |
+--------+---------+

(Mike Sheldon) #14

That’s interesting, what’s the units for the avg, days?


(Gregor) #15

Yup, good point. Those are days.


(Gregor) #16

This might be interesting, a plot of how long clips lie around unvalidated (the second table I posted above, showed the mean of that data):

My interpretation: The box shows how many days that language community usually needs to validate clips and the whiskers show us the range of it (i.e. there’s more active periods). I guess one could also infer things about a community’s interest in validating vs recording.

Back to topic: I’m not yet sure, whether that explains it, I’ll keep looking :male_detective:


#17

Do you have a way of determining if the unvalidated clips of the OP are old or recent? That would help figure out if there’s a validation problem or not.


(Mike Sheldon) #18

Thanks for continuing to investigate this, the info I can add from my side is that while I’m not sure exactly what proportion will have been done at what times I’ve tended to do them in little clusters throughout the last year, so maybe doing 100 in a week and then spending a month of not recording anything.

It was also done across multiple devices prior to the new profile system, with the contributions from each device seemingly being merged correctly when I logged in on that device after the new profile system was introduced.


(Gregor) #19

Always check your assumptions, especially when you have a history of bad memory.

So they weren’t ordered by date after all. I want to swear that they once were, but the git log is too packed for that file, so I won’t until I have proof.
Anyway, I just changed that and will deploy it in a bit. If you don’t mind, could you keep us posted on how the number changes in the next few weeks?

Thanks for the patience :slight_smile: