My language is now collecting voice, what do I need to know?


I want to open this topic because more and more languages are moving to the voice collection phase (thanks to be able to get the minimum number of sentences!)

Now that your language is available for voice collection you might wonder, now what?

Keep in mind that in order to properly train Deep Speech algorithm there are a few big challenges:

  • Get at least 2000 hours of voice recorded and validated.
  • Get at least 1000 different/diverse speakers contributing.

Important: For the models training is important not to get the same sentence recorded more than once. So please keep in mind you will need to keep growing your sentences to accommodate more voice recordings. The math is calculated with 4 seconds per clip on average:

  • The initial 5000 sentences will provide you buffer for around 5,5 hours of voice.
  • For 10 hours you would need 9000 sentences.
  • For 100 hours you would need 90000 sentences.
  • For 2000 hours you would need 1800000 sentences.

We are currently working on new ways to collect this volume of sentences easily and we are looking for technical help to get the volume we need for your language

As a community volunteer you should not be scared about this big goals but rather think:

  • How can I make the experience fun to get people contributing donating a lot of clips?
  • How can engage big crowds of diverse people in order to also get more people contributing?

Let’s use this topic to share some ideas on how we are mobilizing our local communities, from events in Universities, to new ideas to collect voices outside the main app…

We have a list of activities you can do on Mozilla’s Community Portal

:speaking_head: :loudspeaker: Let’s get free from the voice silos!


Usually I introduce them the project about the possibilities of using a local solution, without internet and with privacy in mind. Also that can be used for everything you want because the technology (deepspeech) is already working but we need data.
I invite them to not donate only 5 clips but more because the learning machine model divide the clips by user and not by sentence as I can remember.

Usually like also at last fosdem I was saying that to get your language in the project there are different steps that in few cases are laready done like:

  1. Localize the webiste in Pontoon
  2. Gather the sentences (and review them)
  3. Do the promotion

Usually in this way people was interested because based on their time and interest can find different ways to contribute to the project.