"Listen" Guidance

I could use some guidance on validation. I started working on validating recordings, and very quickly ended up with some questions. First I was surprised to be listening to accents; a bit of thinking quickly cleared this up, and the FAQ actually addresses it (thank you). But there were additional questions. Some recordings had significant gaps before the speech started; is this acceptable? How about ‘mispronounced’ words? Should I try to exercise judgment on what constitutes mispronunciation vs. accent? Or should I exclusively limit myself to assessing whether the recording corresponds with the words?

1 Like

How about missed words and extraneous words?

Thanks for pointing out - once again - that we urgently need accurate and prominently placed validation criteria. There is also a related issue on Github: https://github.com/mozilla/voice-web/issues/273

To answer your question: When you listened to the recording, would you write down what you have heard exactly as the the text on your screen? If not, press no. If the voice contributor read the sentence twice, the beginning or end is cut off, a word is missing or the word order is incorrect: Press no. If there is a typo in the written text, press no. If the reader comments on the sentence, press no. Only if he sticks exactly to the text, press yes.

Laughing, coughing, background noise etc. are no problem, because this is nothing an STT program would transcribe.


Thanks, this is really helpful.

In lieu of official guidelines, I created the following rules for myself:

  • If a single letter is off, it should be rejected.

  • Pay special attention to things like “we’re” vs “we are”. Many speakers get this wrong or inadvertently shorten words when speaking quickly, e.g. “gonna” instead of “going to”.

  • When it comes to accents, there can be some grey areas in terms of pronunciation. I tend to give a fair amount of leeway and base it on whether I could understand the sentence if spoken to me, but I reject if the pronunciation makes it sound too much like a different word.

  • I tend to ignore mispronunciations like “somethink” instead of “something” because they’re pretty common and are not too different from the correct pronunciation.

  • Ultimately a lot of it is up to your own discretion. If in doubt, either skip the clip or reject it. There’s plenty of other recordings.


I see this in the same way as dabinat and support the view that the intended debate must still be recognisable / audible to me.

I am tolerant of recordings of speakers who are not native speakers. I also want to be understood by computers when I speak in a foreign language and I believe everyone has had the experience of having to express themselves comprehensibly abroad, so that it is just enough to be unmistakable.

Translated with www.DeepL.com/Translator

I started to validate, but almost immediately had questions as to what was correct enough. My second sentence was:

This corps’s distinctive badge was in the shape of that of an acorn.

The audio was of a German gent who pronounced corps korps (like corpse) instead of the correct korz. While this is probably a common mistake even among English speakers, it is the pronunciation of a different word.

On the other hand, it’s an honest best effort and would be found in the real world.

So, accept or reject?

I would have rejected that. IMO it’s too far from the correct pronunciation and the pronunciation conflicts with a completely different word.

That’s already discussed here: Discussion of new guidelines for recording validation