Talk to us! How are you using Common Voice?

Em.Lewis-Jong · December 8, 2021, 11:57am

Hey everyone, I’m Em, the new Lead for Common Voice at Mozilla Foundation I’ve met some of you already!

We want to hear stories of how all of you in the community are making use of Common Voice, or planning to! This will help to guide the roadmap, and help us to amplify your efforts to have more people contribute. We would love to hear what you’re building and working on - and how we can help.

Let us know, and as ever, feel free to ask us any questions

EM, Hillary and the team

alfem · June 22, 2021, 7:08am

I look forward to integrating a voice assistant into Home Assistant which uses only LAN resources.

I love home automation, but I do now want my house controlled by a cloud company

ftyers · June 22, 2021, 2:40pm

Hey there! Here are a few things I am working on:

Training speech recognition models that function on-device for languages that don’t already have them (downloadable at models.omnilingo.cc)
Listening-based language learning, basically listening comprehension tasks (demo at demo.omnilingo.cc)
Pronunciation training software for second-language learners (in development)

Em.Lewis-Jong · June 23, 2021, 11:25am

This is amazing Do pass on the question to other people you come across doing amazing things with Common Voice

bytosaur · July 14, 2021, 1:26pm

Hey there,

I love your work!
We use Common Voice for training a Spoken Language Identifier. There still a lot to do but we are getting there
https://github.com/zkmkarlsruhe/language-identification
In the near future, we hope to utilise the meta-data to measure and mitigate the bias in our system.

Thanks for your work,
Paul

heyhillary · July 14, 2021, 1:39pm

stergro · July 15, 2021, 9:07am

Hey, I am not involved into the website, but maybe it is interesting for you that the dictionary https://glosbe.com uses audio files from common voice as samples to show how words are pronounced.

infranscia · July 18, 2021, 10:27pm

I want to use to for non cloud-based dictation.

And honestly, the reason I want that is so I can add captions/subtitles to my stream in a way that DOESN’T use Google or Microsoft’s servers. XD;

Speaking of which, is there any chance this kind of dictation could be built into Firefox? I understand the reason why basically all caption/subtitle systems use Google’s servers is because it’s based on the browser’s built in dictation… and right now, the only browser with built-in dictation is Chrome. ^_^;

brentophillips · August 4, 2021, 2:40am

Hi, our volunteers are going to try to generate some speech datasets to train humanitarian AI applications. We’re starting with some simple datasets then more complex ones, including featuring different language content and text. The main aim is to generate datasets sufficient to train digital assistants how to answer complex queries posed by humanitarian actors using highly structured open data published by aid organizations. The datasets will generally encapsulate different types of information published in aid activity files, read out-loud by volunteers.

heyhillary · August 4, 2021, 12:59pm

Hey @brentophillips , @infranscia , @stergro, @bytosaur , @ftyers , and @alfem

Thanks so much for sharing with us how you are using the Common Voice dataset.

Just, in case you are not aware of this, we hosting two community engagements that you might be interested in taking part in:

Both these sessions are opportunities to get your questions and thoughts shared with us about Common Voice.

heyhillary · September 3, 2021, 1:21pm

heyhillary · September 21, 2021, 12:55pm

Hey everyone, would anyone be interested in doing a lightning talk explaining how they are using the Common Voice Dataset for the Contribute-athon Global on 7th October or 14th October ?

Please private message me if you would be intrested !

Em.Lewis-Jong · November 18, 2021, 7:42pm

Em.Lewis-Jong · November 18, 2021, 7:42pm

Em.Lewis-Jong · November 18, 2021, 7:42pm