My language is now collecting voice, what do I need to know?

:open_book: Mozilla Voice Community Playbook: The source of truth for setting up and maintain self-sustainable communities.


I want to open this topic because more and more languages are moving to the voice collection phase (thanks to be able to get the minimum number of sentences!)

Now that your language is available for voice collection you might wonder, now what?

Keep in mind that in order to properly train Deep Speech algorithm for a general (near-human) recognition model there are a few big challenges:

  • Get at least 2000 hours of voice recorded and validated.
  • Get at least 1000 different/diverse speakers contributing.

:warning: Important: For the models training is important not to get the same sentence recorded more than once. So please keep in mind you will need to keep growing your sentences to accommodate more voice recordings. The math is calculated with 4 seconds per clip on average:

  • The initial 5000 sentences will provide you buffer for around 5,5 hours of voice.
  • For 10 hours you would need 9000 sentences.
  • For 100 hours you would need 90000 sentences.
  • For 2000 hours you would need 1800000 sentences.

:information_source: To get the volume of sentences needed for your language, please check our topic with ideas on where to get high volume sources first

Currently we are generating a new version of the datasets two times per year and publishing them on our site.

:information_source: Note that we are asking for an email to send the link to the dataset (instead of direct download) because we want to have a way to contact everyone who downloaded the data in case we get deletion requests from contributors.

We understand that some people might want more frequent releases, we are working on a more continuous release model to accommodate these needs.

If you want to mobilize your language think about:

  • How can I make the experience fun to get people contributing donating a lot of clips?
  • How can engage big crowds of diverse people in order to also get more diverse voices contributing?

Let’s use this topic to share some ideas on how we are mobilizing our local communities, from events in Universities, to new ideas to collect voices outside the main app…

:speaking_head: :loudspeaker: Let’s get free from the voice silos!


Usually I introduce them the project about the possibilities of using a local solution, without internet and with privacy in mind. Also that can be used for everything you want because the technology (deepspeech) is already working but we need data.
I invite them to not donate only 5 clips but more because the learning machine model divide the clips by user and not by sentence as I can remember.

Usually like also at last fosdem I was saying that to get your language in the project there are different steps that in few cases are laready done like:

  1. Localize the webiste in Pontoon
  2. Gather the sentences (and review them)
  3. Do the promotion

Usually in this way people was interested because based on their time and interest can find different ways to contribute to the project.

6 posts were split to a new topic: Upper Sorbian dataset download