When is a recording accurate?

I would like to ask for clarification as to when something is good (answer ‘yes’) and when not (answer ‘no’).

My current approach is as follows:
I listen to the recording, and if I hear what is spoken in the recording (and no additional words, and of course no words missing), then it is okay for me. I do not care about accents, background noise, etc.

Examples where I would give “no”:

  1. “I am going to” vs. “I’m gonna”
  2. “the farmer” vs. “the farmers”
  3. “He told me not to.” vs. “He told me not.”

Does that make sense?
What are your thoughts?

Overall, I have the impression there is no crap content in the recordings, only small slips from time to time.

2 Likes

I tend to do the same. If it’s not literally the same, I pick no.

I must admit that I also press no sometimes when the accent is really too bad. When I hear that the syllables are stressed in the wrong way, or confusion can occur due to bad pronunciation (e.g. share versus chair), I also press no.

What difference do you mean with your second point about the farmer?

That was a typo. FIxed it.

Related issue on GitHub: https://github.com/mozilla/voice-web/issues/273

I also press no if the last phoneme is cut off – I would understand the meaning, but the recording is not complete.

I read the github issue and it’s still not clear to me how I should rate pronunciation. For example in: “Operating on dynamic data sets is difficult.” someone pronounce “data” as /data/ instead of /ˈdeɪ.tə/ – should I accept it?

@zeno: to answer your original question, you are indeed doing the right thing.

For data, there are actually several valid ways to pronounce it. We should accept any valid pronunciation.

2 Likes

So, as far as I can see, there is no problem is the speaker makes a pause where there is no comma. Just pronunciation matters, isn’t it?