My language is now collecting voice, what do I need to know?

I want to open this topic because more and more languages are moving to the voice collection phase (thanks to be able to get enough sentences!)

Now that your language is available for voice collection you might wonder, now what?

Keep in mind that in order to properly train Deep Speech algorithm there are a few big challenges:

  • Get at least 2000 hours of voice recording.
  • Get at least 1000 different/diverse speakers contributing.

As a community volunteer you should not be scared about this big goals but rather think:

  • How can I make the experience fun to get people contributing donating a lot of clips?
  • How can engage big crowds of diverse people in order to also get more people contributing?

Let’s use this topic to share some ideas on how we are mobilizing our local communities, from events in Universities, to new ideas to collect voices outside the main app…

We have an early list of potential activities to do in this topic.

:speaking_head::loudspeaker: Let’s get free from the voice silos!

Usually I introduce them the project about the possibilities of using a local solution, without internet and with privacy in mind. Also that can be used for everything you want because the technology (deepspeech) is already working but we need data.
I invite them to not donate only 5 clips but more because the learning machine model divide the clips by user and not by sentence as I can remember.

Usually like also at last fosdem I was saying that to get your language in the project there are different steps that in few cases are laready done like:

  1. Localize the webiste in Pontoon
  2. Gather the sentences (and review them)
  3. Do the promotion

Usually in this way people was interested because based on their time and interest can find different ways to contribute to the project.