Hi all,
my first post here.
Let me introduce myself:
I’m an italian researcher (ITD-CNR) on conversational AI, focused on educational realms. I’m developing now a chatbot to help immigrants to learn basic italian language. More in general I reckon the huge importance to have opensource/open-data speech recognition and synthetic voices platforms (in Italian language).
My proposal:
The Common Voice website https://discourse.mozilla.org/c/voice/it is simply great!
Question: how to grow up the submitted recordings with more “channels” (not just the above web site)? One additional channel I’m thinking about is having some Common Voice APIs allowing:
1- the Speak recording submission, ( /speak POST )
2- the Listen user recording validation ( /listen POST )
The availability of these APIs could be perfect for an integration via third party apps. By example, take the case of the above mentioned chatbot; here students run an exercise called “listen and repeat”, where the students have to talk to the chatbot to exercise the pronunciation. The bot propose to the student a word or a phrase to repeat, the voice recording is translated into a text (through a famous cloud-based speech recognition translates) and the student is rewarded in case of “correct pronunciation” (match between the ASR translation and the original word).
So, with these Common Voice APIs, my chatbot could submit to Common Voice sentences already “validated” by a third-party ASR Also, recordings could be added with user metadata and “validating ASR” metadata.
BTW, in the mentioned scenario, speakers are immigrants in Italy, and because that they have a not-language native pronunciation, and so I have doubts about the validity of data submitted.
Does the feature request make sense?
Thanks
giorgio