How to involve more French speakers and readers


I know summer is not a good time to worry but this mounth the French counter of validated hours is frozen… And the rate of increasing hours is very slow, I’m afraid the French common voice part will be ready for my children in 1 or 2 decades… I understood that we need speakers and readers (write sentences is closed subjects)

So how can we (french people contributors on this site) share and get involve more people ?
use social media? blogs ? September is good time to do that, everybody (most) come back from holidays and the motiviation and good will is on top.

I don’t know if I can write, here, the social media which I think ? Maybe we can prepare a “battle plan” of communication for september.

What do you think ?

thanks all,



Hi David,

You can have a look here, if you haven’t already.

There’s an instant messaging group (“Telegram”) where we’re a few French-speaking people interested in Common Voice and discussing this kind of thing. Feel free to join the discussion and share your ideas!

Generally speaking, I think we should try to communicate towards scholars and researchers in linguistics and social sciences. They may know people willing to give their voice, and the potential applications of Common Voice may be of direct interest for them (e.g. it gives a new spoken corpus to study for linguists; it’s also a way to improve speech recognition technologies, which may be useful for interview retranscriptions done by sociologists, etc.). I started to see how the land lies with some colleagues working with spoken corpora. They showed some interest but my guess is that to really involve them, they need to have a direct and obvious interest in it.

Another kind of institutions or persons we could involve are associations giving French lessons to foreign learners. The point for them is that Common Voice gives them sentences to practice their pronunciation; the point for Common Voice is that we would get a wider variety of accents. The problem is probably that we can’t make complete beginners participate, because there would be too many pronunciation mistakes. So in my opinion, it should be limited it to intermediate and advanced learners. It would be great to have the point of view of someone involved in this kind of association, to see how realistic it is.

Another observation I have is that the current corpus misses regional accents, and accents from other countries than France. I don’t know how we could correct that, but there are millions of French speakers outside of France, and it’s a shame that they’re not involved; communicating towards them would make the corpus grow way faster.

If you want to speak about the project on social media and share it with your friends and contacts, talking about the applications that Common Voice may have is relatively efficient to make people participate (e.g. how it helps develop voice recognition systems, which may be useful for people with handicap, etc.). The problem seems to make people participate on the long run, rather than occasionally - even if it’s still great to have people participating!

1 Like

As I said earlier, I think it would be a good idea, we are just short in people able to work on that, so we’d be happy to help you do that :slight_smile:

I was thought that there were more and special people to do that, some one told me that.
Ok, I can create a group on linkedin if it is not already done.

Maybe we could launch a hashtag on social networks like #FiveMinutesOnCommonVoice and try to promote the fact we can easily contribute just five minutes each day?


Coming up with community-driven initiatives like this is great. Feel free to test some ideas and let’s see what happen :smiley:

Good idea, but IMHO what we really lack, as of today, is a few people stepping up and animating contributions.

Related with this conversation

Utiliser les enceintes avec assistant vocal pour enregistrer les phrases CommonVoice

Pourquoi pas ? Une application ’ Répète après moi ’ permettrait de recueillir des phrases entendues et non lues par les bénévoles qui donnent leur voix.

En proposant à chaque démarrage de choisir le nombre de mots des phrases, cette ’ skill ’ permettrait à chaque bénévole de régler la difficulté de la répétition, pour éviter les erreurs. La reconnaissance vocale (Speech to Text) permettrait même d’éliminer les erreurs par simple comparaison des textes en sortie et en entrée, avant une validation humaine.

Bien sûr, Alexa ou Google Home ne sont pas libres. Mais Firefox ne fonctionne-t-il pas sur Windows ou Android ? Alors pourquoi pas CommonVoice sur les OS vocaux propriétaires existants ?

Selon moi, les avantages seraient nombreux :

  • pour tous, la possibilité de participer à CommonVoice dans les situations où on n’a pas les mains libres, et une plus grande disponibilité et rapidité de lancement

  • pour CommonVoice, la possibilité d’enregistrer des phrases en situation réelle (quoi de plus réel qu’un fichier provenant d’une enceinte existante)

  • pour les bénévoles sur web ou smartphone, la possibilité de n’avoir que les belles phrases longues, car toutes les petites phrases facile à répéter de mémoire auront été faites sur les enceintes connectées

  • pour les personnes avec handicap moteur ou visuel, la possibilité de participer aussi

  • pour les personnes apprenant une langue, un outil d’entrainement par la répétition

  • pour les personnes avec trouble de la communication (troubles autistiques, phobies, …) la possibilité de s’entrainer aux compétences sociales.

Sans doute, les avertissements de l’utilisation en licence CC0 des fichiers captés nécessiterait le recueil du consentement en s’inscrivant sur le web avec le compte actif dans l’enceinte connectée (ex. Compte gmail pour Google Home). Mais ensuite, une simple application ayant accès à la base de données textuelles CommonVoice suffira à générer une grande quantité de fichiers vocaux.

Qu’en pensent les communautés CommonVoice et DeepSpeech ?

1 Like

@laubern This category is English-only, I acknowledged we lack an “official” channel for communities to talk in their languages, we hope to fix this in the near future.

I'm providing here a machine translation version of your post, since I feel it's really interesting

Use speakers with voice assistant to record CommonVoice phrases

Why not ? An application ‘Repeat after me’ would collect phrases heard and not read by volunteers who give their voice.

By proposing at each start to choose the number of words of the sentences, this ‘skill’ would allow each volunteer to solve the difficulty of repetition, to avoid mistakes. Speech to Text would even eliminate errors by simply comparing output and input texts before human validation.

Of course, Alexa or Google Home are not free. But does not Firefox work on Windows or Android? So why not CommonVoice on existing proprietary voice OS?

In my opinion, the benefits would be many:

for all, the opportunity to participate in CommonVoice in situations where you do not have your hands free, and greater availability and speed of launch

for CommonVoice, the ability to record sentences in real-life situations (what’s more real than a file from an existing speaker)

for volunteers on the web or smartphone, the possibility to have only beautiful long sentences, because all the little sentences easy to repeat memory will have been made on the connected speakers

for people with motor or visual disabilities, the opportunity to participate also

for people learning a language, a training tool by repetition

for people with communication disorders (autistic disorders, phobias, …) the possibility of training in social skills.

Undoubtedly, the warnings for the use of captured files in CC0 license would require the collection of consent by registering on the web with the active account in the connected speaker (eg gmail account for Google Home). But then, a simple application having access to the CommonVoice textual database will be enough to generate a large amount of voice files.

What do the CommonVoice and DeepSpeech communities think about it?

The “repeat after me” idea is really interesting and could tie in with Mozilla’s text-to-speech project. I like that idea a lot. It would be a simple way to get more natural-sounding sentences. There are definitely users who sound like they’re reading and I’ve always wondered just how useful their contributions actually are.

What I don’t know is if license from these services allow you to use that to train your own algorithms, which I remember reading they didn’t (for obvious reasons).

I don’t think that idea necessarily needs to be limited to smart speakers. A “repeat after me” interface could be integrated into the Common Voice website.

I did post the “repeat after me” idea in English in another subject. But it does not seem to be inspiring for people here, since it had no reply in 7 days : Use speakers with voice assistant to record CommonVoice sentences

1 Like

Anyway, I keep thinking Google might accept the use of Google home to enhance CommonVoice and train DeepSpeech, since they accept the use of TensorFlow.

Sound really like a good idea, first one I got in mind: Fondation Alliance Française

Maybe we can start to focus on Reddit? They are a lot of sub for French speakers all around the world.

Do you have contacts with them ?

Why not, but honestly, Reddit is a time machine for me, so I can’t go there.

Nope, just know them by word of mouth :-/