Invitation: Three talks on Common Voice on FOSDEM'22 tomorrow (Feb. 5th)

Hi everybody,

FOSDEM’22 will be held on 5-6 February 2022. FOSDEM is a two-day event organised by volunteers to promote the widespread use of free and open source software. FOSDEM is widely recognised as the best such conference in Europe.

Mozilla Devroom on FOSDEM is on Feb 5th.

There are three talks related to Common Voice:

The first one is by Michael Kohler between 11:30-12:00 (UTC+1):

Collecting Sentences for Common Voice

Collecting Sentences through different means to allow others to record voices for them

Common Voice is a project to help make voice recognition open and accessible to everyone. To create this data set Common Voice allows volunteers to record defined sentences to contribute their voice. A good data set needs a lot of recordings, and therefore we need to have a lot of sentences to be read out aloud. In this talk Michael will introduce the audience to several ways we are collecting these sentences and goes into more technical detail for these mechanisms. This talk will also feature an intro to Common Voice at the beginning.

The second presentation is by Saverio Morelli between 12:30-13:00 (UTC+1):

“CV Project app”: How an Android app can change the Mozilla Common Voice project

Talk about the “CV Project” app, which is a native Android app to contribute to Mozilla Common Voice via the smartphone.

The third talk is by Bülent Özden between 14:00-14:45 (UTC+1):

How to Start a Language on Mozilla Common Voice?

A case study for under-resourced Turkish Language

On Mozilla Common Voice, as of December 2021, there are 154 locales, but only 87 fulfilled the requirements to collect voices, where 27 of them are fairly new. In this two-part presentation, we want to give some starting points for the new language communities, share our accumulated knowledge in the last year while working on the under-resourced Turkish language, with initial training results.

The presentation includes the following topics: Resources on Mozilla Common Voice, how to analyze your dataset, how to set goals, how to design a social media campaign, what tools you can use, Google Colabs, Coqui STT, and our roundups on training Common Voice Turkish Dataset v1 - v7.0, all with our successes and failures as Common Voice Turkish Volunteers group as lessons learned.

Addendum: Our dataset analysis and training results for the Common Voice v8.0 dataset have been added as new slides and video.

  • FOSDEM’22 is organized as a remote event and the main talk is pre-recorded and uploaded.
  • The talk is viewed together with a matrix chat environment where you can ask questions and make comments. The speaker can answer these through chat. You can also use thumbs-up :+1: to vote for another question.
  • After the talk, there will be an online Questions & Answer section, most voted questions will come first.
  • Conversation can continue on chat if Q&A section is not enough.

You can join the Mozilla Devroom track by clicking here.

You are all invited :slight_smile:


The recordings of the three talks about Common Voice at FOSDEM 2022 are now available: