Discussion of new guidelines for recording validation


Common Voice is intended to provide greater representation for a diverse range of accents, so I don’t think native English speakers should solely dictate what is/isn’t acceptable pronunciation.

I tend to reject pronunciations that are extreme or sound too similar to a different word and I skip if I’m not sure, otherwise I try to be generous.

But I agree that the majority of the examples shown should be rejected.

(Megan Branson) #22

@mbranson Thanks for the feedback. If it would be useful I could work up some similar guidelines to be linked from Speak page. They can be largely based on the same examples, but the focus would need to be slightly different.

Makes sense @Michael_Maggs, they are indeed different interactions but I’d be cautious to overwhelm with the amount of information provided. How might we convey guidelines for both Speak and Listen in one document so they complement and inform one another? cc @nukeador

(Michael Maggs) #23

Thanks @mbranson. I’ve been thinking about how best to achieve that, and I’m not entirely sure how it would work. Some of the explanations will inevitably have to be different for speaking and reviewing, and putting both into the same document would make it quite unwieldy I’d have thought. I suppose one could have instructions in two columns, with common examples, but it won’t be very user-friendly. Or separate explanatory texts with links to a common set of examples, but that would require multiple click-throughs. Did you have something specific in mind?


How do you deal with situations where the person hesitates and elongates the word? DeepSpeech does need to be able to deal with drawn-out or over-emphasized letters after all.

I generally approve as long as the person doesn’t break the word.


“I wasn’t s…sure” = reject
“I wasn’t sssssure” = approve

I just wanted to see how others dealt with it. It’s helpful if we’re all operating by the same rules.

(Michael Maggs) #25

I do the same as you. Accept if it’s an elongation; reject if the reader takes two attempts to start the word.

(Rubén Martín) #26

This conversation is great, thanks everyone involved.

@Michael_Maggs would you be interested in maintaining an updated first post here with all the suggestions we have been hearing so once we feel comfortable (and maybe signaling must-have vs nice-to-have) so we can in the future run an exercise with our great User Experience team to turn that list into something more visual and fast to visualize for the site? :smiley:


(Michael Maggs) #27

Yes I’d be happy to do that. Will update in the next day or two.

Perhaps it would make sense to for me to separate out into a new thread guidelines for validation of uploaded sentences, as those will mostly be of use in the Sentence Collector.


I’m curious what fellow validators think about this: https://github.com/mozilla/voice-web/issues/1927

(Rubén Martín) #29

I’ll quote your message there for reference

Recently a change was made to the site to list sentences with the fewest recordings first in order to add more unique sentences for the DeepSpeech model. I think that this was a good idea overall, however I’m starting to see something that could be a problem.

Some users are recording a LOT of sentences. In fact, over the past few days I have validated around 1500-2000 clips and I would estimate around 70% of them were recorded by the same user, all of which were unique sentences.

I’m sure that the DeepSpeech team makes certain that there aren’t too many recordings by a single user, so these sentences will most likely be discarded until there are more recordings available. But if the site shows sentences with the fewest recordings first, it will have to go through the thousands of unrecorded sentences to get to that point again, which may never happen if more sentences keep getting added.

The DeepSpeech team said they don’t want more than a few hundred recordings from any one user. So a user with 5000 recordings may have prevented 4700 sentences from making it into the model.

So I think the solution to this is either:

Put a hard limit on the total number of recordings users can make or have a daily per-user limit.

Change the algorithm so that each sentence has, say, 3 recordings minimum before it’s given a lower priority in the queue.

In the coming weeks we will be working on a few experiments involving personal goals and also invite more people to contribute since, as we have commented in the past, diversity is super important.


FYI, I edited the post since then to clarify a couple of things I thought were unclear, but probably most relevant is that I thought of an additional option:

  1. Deprioritize recordings in the validation queue by users who have x number of validated recordings. So we’re prioritizing unique users AND unique sentences.

In my opinion some combination of options 2 and 3 is best.

(Michael Maggs) #31

Since I prepared the draft guidelines at the top of this thread the number of sentences for recording verification that have errors in the written text has much decreased - probably as most have now gone through the sentence collector.

To avoid overloading the recording reviewer guidelines I’d now suggest removing the Problems with the written text section entirely. It can be moved over into a new guideline for sentence reviewers. At present all they have is a single paragraph on this page: https://common-voice.github.io/sentence-collector/#/how-to.

(Michael Maggs) #32

I’ve moved the sentence review guidelines over to this thread: Discussion of new guidelines for uploaded sentence validation

(sgqy) #33

Hello. What should I do if the audio/voice is generated by robots(TTS)? :sweat_smile:

(Michael Maggs) #34

You should reject it. If you’re finding large numbers of instances, it would be worth posing a few examples here so that the developers can remove the whole batch by script, if need be.

1 Like
(Fernando) #35

I wonder if it should be the mandatory registration and there would be a moderator to avoid funny things that give a lot of work thanks