Discussion of new guidelines for recording validation

Yes, this is fine. The algorithm is designed to deal with this.

1 Like

Hi, I’m happy to finally find the guidelines. I believe they should be added somewhere on the main site. Phrases that I found there, “Users validate the accuracy of donated clips, checking that the speaker read the sentence correctly” and “did they accurately speak the sentence?” are just too vague.
I understand that the extensive guidelines are hidden here to avoid scaring new users, but I think these should be available on the main site for users who prefers to be precise in their decisions in validating/rejecting clips.

1 Like

@Michael_Maggs
Misspelling, “out” is written twice in the guideline.

1 Like

Well spotted! I’ve made the correction.

Are these guidelines going to be published somewhere one the main page? Right now they’re very hard to find.

2 Likes

Hi @EwoutH, welcome to the Common Voice community!

This is definitely something we want to see how to best implement on the site in 2020. Right now the development focus is on the infrastructure improvements.

We’ll definitely use this work to inform any changes in 2020.

Thanks!

1 Like

I approve any recording that is understandable and match the text, including incorrect pronunciations as long as it’s a common one.

It’s an inconvenient truth that any somewhat non-basic word will have multiple pronunciations over the world, but I don’t think keeping the “technically incorrect” ones out of the dataset is the solution, it’s rather something that needs to be handled in a way that accounts for it.

1 Like

@Michael_Maggs in case you want to update the post with links to localizations to other languages:

Spanish 📕 Guía para validación de grabaciones en common voice

1 Like

@Michael_Maggs thanks again for your consistent work on this (and to all of you for the input)! I see that there has been a break out for validating sentences (for sentence collector) and that most of the criteria listed here are for the act of validation (/listen). In the thread (a while back now, sheesh time flies) we started chatting about the pros/cons of breaking out criteria for /speak vs. /listen vs. one cohesive list. I now agree that a separate set of criteria for each is the best direction (there may be some overlap of course, such as criteria about background noise). I wonder if I’ve missed a post that is focused on suggested recording criteria (for /speak)? If not, is this something you’re still interested in creating?

1 Like

Another new contributor here.

I think it is super important that these be shown to new users.

After reading these I’m realizing I’ve been way to lenient validating clips (giving a “thumbs up” to clips where I could just barely tell what a speaker was trying to say, in the interest of accent diversity).

The first two places I looked for some guidelines were the FAQs, then the About page.

Another couple places that may work are:

  • Account creation screen
  • Bottom of each page, near FAQs, Terms, etc.

I think just an FAQ entry would help a lot of users; I kind of lucked out stumbling across this.

3 Likes

Other questions about validation & quality:

  • Silent time before the speaker starts. Is there some “hard” limit about this, eg “no less than 1 or 2 seconds” or is something alignment is able to fix? And about the end?

  • Audible clicks : In most cases we can here very noisy mouse clicks. Is this a problem?

  • Hesitation: Sometimes, users hesitate in the middle of a sentence. Either at the cost of a small vocal artifact either at the cost of a lengthy silence. Is it something alignment deals with or that should be rejected?

1 Like

Also a new contributor here and think that due to lack of instruction for validators (such as the ones in OP) the dataset is probably very inconsistent. One plus is that the people who validate a large number of clips are probably more active in the community and have seen this thread.

Can I ask how these guidelines were decided? Who chose them? As someone primarily focused on applications for non-native speakers I think marking any intelligible but incorrect pronunciations as invalid may be a mistake. For instance “rais-ed” although incorrect, is intelligible to native speakers of English and if you’re building a speech recognition system for English, I’d argue that you’d want the system to understand that when someone says “rais-ed”, they mean “raised”. That can only be achieved if examples like that are considered valid in the dataset.

1 Like

Audible clicks and other noise should be considered valid in my opinion. They don’t affect the labeled transcription and any machine learning models trained on the dataset will learn to ignore the sounds. The same can be said for silence and hesitation. If the dataset doesn’t contain these artifacts, when people use products built using the dataset the products won’t be able to handle those artifacts when they frequently occur in the real world.

But during the learning phase, a clip is split and aligned with words.
That’s were I wonder (@Tilman_Kamp ?) if this could affects negatively learning (implying a prejudicial effect on the final model).

This is gold. Where were these when I was first starting out? :slight_smile:

I guess newbies might want to go through these before they even start to dip their toes.

On the other hand, are there any procedures of evaluating beginner contributors? If someone ignores the guidelines systemically (eg. unwittingly) maybe they could use some help.

1 Like

Hi everyone. Thanks for sharing the new guidelines of this great project, I already have started validating quite a number of voice samples and would like to know if am allowed to record voice samples in languages that are not my native language.

Hi there, if you are a speaker of the language then yes you definitely should record samples. My advice would be, don’t record anything that you don’t know what it means, but if you can understand the sentence, then sure, the more voices the better.

Many thanks for your swift response.

1 Like

Hi everybody,
Given the complexity of the guidelines, I think only meticulous people should be allowed to review and accept sentences. I am thinking of my own language where you will have many people being able to provide sentences, but when it comes to reviewing, you cannot guarantee that all accepted sentences/recording are actually correct.
My question is, will there be some kind of admin who will do a final reveiw to check quality?
There should even be many levels of control if we want to avoid a low quality dataset.

Cheers
Ibrahima

The quality check is the validation by two other speakers, both in sentence collection and recording. If you have any other specific questions you could make a new topic and tell us a bit about your language and the kind of issues you are finding or expecting to find :slight_smile:

1 Like