Feature request: Common Voice APIs?

Hi all,
my first post here.

Let me introduce myself:
I’m an italian researcher (ITD-CNR) on conversational AI, focused on educational realms. I’m developing now a chatbot to help immigrants to learn basic italian language. More in general I reckon the huge importance to have opensource/open-data speech recognition and synthetic voices platforms (in Italian language).

My proposal:
The Common Voice website https://discourse.mozilla.org/c/voice/it is simply great!

Question: how to grow up the submitted recordings with more “channels” (not just the above web site)? One additional channel I’m thinking about is having some Common Voice APIs allowing:

1- the Speak recording submission, ( /speak POST )
2- the Listen user recording validation ( /listen POST )

The availability of these APIs could be perfect for an integration via third party apps. By example, take the case of the above mentioned chatbot; here students run an exercise called “listen and repeat”, where the students have to talk to the chatbot to exercise the pronunciation. The bot propose to the student a word or a phrase to repeat, the voice recording is translated into a text (through a famous cloud-based speech recognition translates) and the student is rewarded in case of “correct pronunciation” (match between the ASR translation and the original word).

So, with these Common Voice APIs, my chatbot could submit to Common Voice sentences already “validated” by a third-party ASR :wink: Also, recordings could be added with user metadata and “validating ASR” metadata.

BTW, in the mentioned scenario, speakers are immigrants in Italy, and because that they have a not-language native pronunciation, and so I have doubts about the validity of data submitted.

Does the feature request make sense?
Thanks
giorgio

2 Likes

Hey giorgio,

Since the website works without login this could be already possible. E.G. a quick search in the developer tools showed this call to get sentences:
https://voice.mozilla.org/api/v1/it/sentences?count=10

I am sure there is another one to send the mp3s.

The bigger question is if mozilla wants people to use this API. I like the idea to diversify sound sources, but it has to be thought through very well.

Hi,

There were some experiments in the past:

There are two main and big open questions:

  1. How do we ensure we are getting legal consent from people to use their recordings and license them under Public Domain?
  2. How can we make sure the quality checks we perform on the Common Voice site are also used by any third party app?

I think this is a good conversation starter, because we need to make sure these two points are covered and we are delivering valid data we don’t have to clean later.

2 Likes

Thanks Stefan for your feedback.

Since the website works without login this could be already possible.

yes…

E.G. a quick search in the developer tools showed this call to get sentences:
https://voice.mozilla.org/api/v1/it/sentences?count=10

Interesting, that’s the list of most recently submitted sentence, I presume.

I am sure there is another one to send the mp3s.

yes, I’m Looking for those one

The bigger question is if mozilla wants people to use this API.

That’s the implied question in my original post.
I think that API interface could introduce issues, by example possible “fake”/malicious sentences… I have to think about it.

I like the idea to diversify sound sources, but it has to be thought through very well.

Yes, that’s, in my opinion a possible big issue, especially for language with few data at the moment, as Italian. The rick is to have a too “biased” data set. I’ll post about this topic in a new post.

thanks
giorgio

Thanks Ruben for interesting post you mentioned.

How do we ensure we are getting legal consent from people to use their recordings and license them under Public Domain?

I presume the common voice web solve the legal content question, because the user deliberately click the record button, right?

So what with a third party chatbot or mobile app that could submit via API?
maybe the third party app could explicit ask consensus on the “registration” phase?
In my chatbot I already do that to comply GDPR regulations. Maybe the solution could be to just add a more consensus on the anonymized public domain sharing? No?

How can we make sure the quality checks we perform on the Common Voice site are also used by any third party app?

Good point. Wait, I see two aspects:

In my mind I have the idea that the app could submit at the same time a “new input sentence text” (not already present in Common Voice dataset) and the corresponding voice recording

1 - the input sentences
This is “critical” because these could be unforeseen sentences. Anything. Possible automatic validation: language control (is the sentence labeled Italian language, really in Italian?), content control (is the content allowed? This is maybe less “automatable”).

2 - the related voice submission
Here I see a possible pro in the scenario I described: the sentence could be submitted already validated by a ASR by submitting app (say via free Wit.ai or google speech or whatever) or by Common Voice in the beckend… The user listening quality control remains always to be done.

we need to make sure these two points are covered and we are delivering valid data we don’t have to clean later.

I agree
thanks
giorgio