When is a recording accurate?

zeno · May 9, 2018, 4:59pm

I would like to ask for clarification as to when something is good (answer ‘yes’) and when not (answer ‘no’).

My current approach is as follows:
I listen to the recording, and if I hear what is spoken in the recording (and no additional words, and of course no words missing), then it is okay for me. I do not care about accents, background noise, etc.

Examples where I would give “no”:

“I am going to” vs. “I’m gonna”
“the farmer” vs. “the farmers”
“He told me not to.” vs. “He told me not.”

Does that make sense?
What are your thoughts?

Overall, I have the impression there is no crap content in the recordings, only small slips from time to time.

jef.daniels · May 13, 2018, 9:18am

I tend to do the same. If it’s not literally the same, I pick no.

I must admit that I also press no sometimes when the accent is really too bad. When I hear that the syllables are stressed in the wrong way, or confusion can occur due to bad pronunciation (e.g. share versus chair), I also press no.

What difference do you mean with your second point about the farmer?

zeno · May 14, 2018, 9:50am

That was a typo. FIxed it.

zeno · May 14, 2018, 11:23am

Related issue on GitHub: https://github.com/mozilla/voice-web/issues/273

zeno · May 24, 2018, 11:51am

I also press no if the last phoneme is cut off – I would understand the meaning, but the recording is not complete.

wojtek · September 6, 2018, 9:00am

I read the github issue and it’s still not clear to me how I should rate pronunciation. For example in: “Operating on dynamic data sets is difficult.” someone pronounce “data” as /data/ instead of /ˈdeɪ.tə/ – should I accept it?

mhenretty · September 7, 2018, 3:26pm

@zeno: to answer your original question, you are indeed doing the right thing.

For data, there are actually several valid ways to pronounce it. We should accept any valid pronunciation.

adocampo · September 19, 2018, 7:24pm

So, as far as I can see, there is no problem is the speaker makes a pause where there is no comma. Just pronunciation matters, isn’t it?

Topic		Replies	Views
"Listen" Guidance Common Voice feedback	8	2109	March 3, 2020
Instructions for validation Common Voice	1	1213	November 3, 2017
Cómo evaluar correctamente un audio Español (es)	1	1048	May 28, 2020
Heteronyms of homographs = No Common Voice	2	600	January 27, 2020
What are the rules for what constitutes accuracy? Common Voice	2	590	July 8, 2018

When is a recording accurate?

Related topics