A new 🐸 in (speech tech) town

reuben · March 15, 2021, 9:37am

There was an idea. The idea was to bring together a group of remarkable people to see if they could become something more.” No, I’m not talking about the Avengers here, but the core ML team behind DeepSpeech and other open speech tools that, together with you, has been growing and maturing these projects from research to production-readiness. Today this team – Eren Gölge, Josh Meyer, Kelly Davis and myself – is happy and most proud that the “something more” has materialized: We are starting a new open source venture purely focused on speech tech, Coqui ().

For the last 5 years, we have put speech tech into the hands of low-resource language communities, researchers, and in production systems you may have already talked to. Building on the foundation of our work and thrilled by its growing real-world adoption we want to take our toolkit to the next level. We want Coqui to become the central home for a vibrant community of researchers, developers, and practitioners who can take advantage of our code, the continuously improving models we release, and the technical support we provide. We want to create the largest open speech community for everyone in speech tech including researchers, developers, practitioners, companies, and enthusiasts. Furthermore, we want to evade the disparity in research and production and bridge the gap enabling an efficient collaboration between different roles in the R&D cycle.

Implemented with the TensorFlow framework and simple to integrate into your applications, Coqui STT can run on anything from an off-line Raspberry Pi 4 to a server class machine, obviating the need to pay patent royalties or exorbitant fees for existing STT services. In addition, this ability to run on embedded hardware opens up a myriad of innovative application possibilities – IoT, automotive, robotics, and many more things we have yet to explore – while keeping data private and safe.

Coqui TTS provides a set of utilities to help create text-to-speech systems from the ground-up, allowing you to create interactive voice interfaces, smart assistants, and accessibility tools. It enables high quality, natural voice synthesis with comparable or better results than any other commercial or open-source solution. TTS currently serves pre-trained models in 7 languages with a ready to use CLI and server run-times as to enable open speech synthesis for everyone. Our mission with TTS to let everyone be able to develop, use, and research speech synthesis without constraints. To make that happen in the near future, we also want to improve the TensorFlow integration and let you use your favorite deep learning library to create your next TTS project. If you want a head-start for contributing to TTS you can check our TODO list.

We know that there’s appetite and immense untapped potential out there for open alternatives in the exponentially growing speech tech market. We know that we bring in the right combination of passion, machine learning expertise, and network into research and industry to prove that these open alternatives can not only exist but succeed. And we hope that the numerous contributors and communities; that you will support us on our path the way you did before.

Check out the new homes of our voice projects at github.com/coqui-ai, reach out to us and stay tuned for more to come.

othiele · March 15, 2021, 3:24pm

Great news that there will be people working fulltime on this instead of just a couple of minutes a week

mrthorstenm · March 15, 2021, 4:10pm

Great to hear .
Wishing you guys all the best for this noble way to go. I’m happily by your side on your way to “free voice” (in my free time).

JGKK · March 15, 2021, 4:40pm

Sorry if i over read that it was mentioned somewhere. How does Coqui relate to the future deepspeech development?
As i understand it Coqui-STT right now is pretty much a Deepspeech fork?
Will the two have a separate future development which will make them incompatible at some point?
Will there be future compatibility with models trained for deepspeech?
Am i misunderstanding the relation between the two?

Thanks for the amazing work, Johannes

belkacem77 · March 15, 2021, 4:20pm

We were worried during the last few weeks. Thanks everyone. It is time to goi on with evangelization.

reuben · March 15, 2021, 5:23pm

Hi Johannes. Coqui has no connection with Mozilla the company beyond the founders being Mozillians. We’ll continue to develop the Coqui STT fork with our open source collaborators, and we have ambitious goals for the codebase in order to make it easier to use, so at some point the forks may become incompatible. As for Mozilla’s plans, the situation hasn’t changed. The last communication about this was this one.