There was an idea. The idea was to bring together a group of remarkable people to see if they could become something more.” No, I’m not talking about the Avengers here, but the core ML team behind DeepSpeech and other open speech tools that, together with you, has been growing and maturing these projects from research to production-readiness. Today this team – Eren Gölge, Josh Meyer, Kelly Davis and myself – is happy and most proud that the “something more” has materialized: We are starting a new open source venture purely focused on speech tech, Coqui ().
For the last 5 years, we have put speech tech into the hands of low-resource language communities, researchers, and in production systems you may have already talked to. Building on the foundation of our work and thrilled by its growing real-world adoption we want to take our toolkit to the next level. We want Coqui to become the central home for a vibrant community of researchers, developers, and practitioners who can take advantage of our code, the continuously improving models we release, and the technical support we provide. We want to create the largest open speech community for everyone in speech tech including researchers, developers, practitioners, companies, and enthusiasts. Furthermore, we want to evade the disparity in research and production and bridge the gap enabling an efficient collaboration between different roles in the R&D cycle.
Implemented with the TensorFlow framework and simple to integrate into your applications, Coqui STT can run on anything from an off-line Raspberry Pi 4 to a server class machine, obviating the need to pay patent royalties or exorbitant fees for existing STT services. In addition, this ability to run on embedded hardware opens up a myriad of innovative application possibilities – IoT, automotive, robotics, and many more things we have yet to explore – while keeping data private and safe.
Coqui TTS provides a set of utilities to help create text-to-speech systems from the ground-up, allowing you to create interactive voice interfaces, smart assistants, and accessibility tools. It enables high quality, natural voice synthesis with comparable or better results than any other commercial or open-source solution. TTS currently serves pre-trained models in 7 languages with a ready to use CLI and server run-times as to enable open speech synthesis for everyone. Our mission with TTS to let everyone be able to develop, use, and research speech synthesis without constraints. To make that happen in the near future, we also want to improve the TensorFlow integration and let you use your favorite deep learning library to create your next TTS project. If you want a head-start for contributing to TTS you can check our TODO list.
We know that there’s appetite and immense untapped potential out there for open alternatives in the exponentially growing speech tech market. We know that we bring in the right combination of passion, machine learning expertise, and network into research and industry to prove that these open alternatives can not only exist but succeed. And we hope that the numerous contributors and communities; that you will support us on our path the way you did before.