Last week, we mentioned that we would release a guide on how to use DeepSpeech’s codebase as a powerful starting point for any direction. Today, I would like to share the DeepSpeech PlayBook. This work squarely fits under Mozilla’s goal to democratize and diversify voice technology and is part of a toolkit to help people, researchers, companies and any interested parties use DeepSpeech to build their own voice-based solutions. Please come back here for an update in the coming weeks for more about the grant program we’re planning to launch.
When DeepSpeech, Mozilla’s free and open source speech recognition engine, is coupled with data from Common Voice, it can be used to create new speech recognition models - often for languages that haven’t previously been afforded this capability. In turn, these speech recognition models can be applied in a wide variety of use cases and domains, from recognising real-time streaming audio, to transcribing speech from the web.
DeepSpeech has already had significant real-world impact, particularly for underserved languages. For example, Te Hiku Media in Aotearoa, New Zealand, produces content in Te Reo Māori (the Māori language), and uses DeepSpeech for transcriptions. The University of Bangor, in Wales, has used DeepSpeech and CommonVoice to create a voice assistant in Cymraeg, the Welsh language.
The potential is huge, yet it can be overwhelming to get started with DeepSpeech. Like with any machine learning tool, there is a lot to learn. While there’s comprehensive documentation, those new to DeepSpeech often face a steep learning curve which can act as a barrier and in turn dilute DeepSpeech’s democratizing impact. The DeepSpeech PlayBook seizes the opportunity to provide an end to end tutorial that enables DeepSpeech developers to quickly get to “hello world”. Or “bonjour”. Or “selamat siang”. Or “habari”.
Like the CommonVoice PlayBook counterpart, the DeepSpeech PlayBook provides guidance on getting started, what success looks like, and common pitfalls. The PlayBook also provides sample code and worked examples for common scenarios, such as training a speech recognition model in a new language using CommonVoice data.
One of the most common hurdles when working with DeepSpeech is the difference in individual environments; operating systems, GPUs, Python versions and dependencies can all present barriers to beginners. The PlayBook uses Docker to abstract away the complexities of individual environments, and ensures that the PlayBook examples can be used both on a local machine, or with cloud computing resources.
Now that DeepSpeech is stabilized and a comprehensive PlayBook available, Mozilla staff will be moving away from providing routine support. Instead, DeepSpeech will be transitioned to the people and organizations interested in furthering the development of targeted use cases. As we mentioned last week, to support this move, Mozilla will launch a grant program that will fund a number of initiatives aimed at demonstrating applications for DeepSpeech. Projects that contribute to the core technology while showcasing the potential of DeepSpeech to empower under-served areas will be prioritized. The grant submission process will be announced in May.
You can see the DeepSpeech PlayBook here.