I want to open this topic because more and more languages are moving to the voice collection phase (thanks to be able to get the minimum number of sentences!)
Now that your language is available for voice collection you might wonder, now what?
Keep in mind that in order to properly train Deep Speech algorithm for a general (near-human) recognition model there are a few big challenges:
- Get at least 2000 hours of voice recorded and validated.
- Get at least 1000 different/diverse speakers contributing.
Important: For the models training is important not to get the same sentence recorded more than once. So please keep in mind you will need to keep growing your sentences to accommodate more voice recordings. The math is calculated with 4 seconds per clip on average:
- The initial 5000 sentences will provide you buffer for around 5,5 hours of voice.
- For 10 hours you would need 9000 sentences.
- For 100 hours you would need 90000 sentences.
- For 2000 hours you would need 1800000 sentences.
To get the volume of sentences needed for your language, please check our topic with ideas on where to get high volume sources first
Currently we are generating a new version of the datasets two times per year and publishing them on our site.
Note that we are asking for an email to send the link to the dataset (instead of direct download) because we want to have a way to contact everyone who downloaded the data in case we get deletion requests from contributors.
We understand that some people might want more frequent releases, we are working on a more continuous release model to accommodate these needs.
If you want to mobilize your language think about:
- How can I make the experience fun to get people contributing donating a lot of clips?
- How can engage big crowds of diverse people in order to also get more diverse voices contributing?
Let’s use this topic to share some ideas on how we are mobilizing our local communities, from events in Universities, to new ideas to collect voices outside the main app…
Let’s get free from the voice silos!