Mozilla Voice STT in the Wild!

dsteinman · January 29, 2020, 6:04pm

My name is Dan, and I’m working on a voice + motion control system called Jaxcore. I’m using DeepSpeech to add speech recognition to control computers / home theaters / and smart home devices.

The upcoming open source desktop app will have DeepSpeech built-in and voice commands for doing things like controlling your computer mouse, typing on your computer, controlling media players.

You’ll be able to write web games that have speech recognition (using only client-side Javascript):

The speech recognition libraries are all modular and can be used individually:

baconator · January 31, 2020, 6:36am

I use Deepspeech as a local STT engine for mycroft.ai. It’s called via the deepspeech-server tool, so I can have multiple devices access it. Runs on a desktop cpu quite nicely. The audio is saved from mycroft, which I can use for fine tuning down the road sometime.

There’s a fine-tuned model and I use some filtering to help. Accuracy is good, latency is good.

lissyx · January 31, 2020, 10:42am

This one? https://github.com/MainRo/deepspeech-server

baconator · January 31, 2020, 12:57pm

That’s the one. Used that for a while now.

lissyx · February 4, 2020, 4:10pm

bernardohenz · February 6, 2020, 6:18pm

Our company Iara Health provides a system to aid radiologists in the writing of medical reports in the portuguese (BR) language. All our system is built over DeepSpeech, running locally on the user’s computer.

In the video above, you can see our portal being able to recognize commands (like loading a template), and handle punctuation, acronym and abbreviations. Our system eases the work of radiologists, making them produce more in less times.

We want to thank Mozilla for DeepSpeech.

anas9011 · February 16, 2020, 3:34am

We’re using DeepSpeech in tarteel.io to recognize Quran recitation and correct people’s mistakes on a word-by-word level!
The Quran is the Muslim’s holy book and we are instructed to recite it with "Tarteel, which translates best to “slow measured rhythmic tones”.
Muslims who try to memorize the Quran sometimes struggle to find an instructor someone to correct them. The Tarteel platform provides a “Quran Companion” they can use to recite to when they don’t have any to correct them.

elpimous_robot · February 22, 2020, 7:00pm

Hello.
I’m French, autodidact…

I’m a robotician (passion only), and I work on social interactions between robots and humains.
Vocal interactions are imperatives…
Thanks to Deepspeech.

I made a tuto to help each other create it own model.

victornoriega7 · February 24, 2020, 6:51pm

I’m a computer scientist student intrigued and interested by data science, who want to deploy a spanish model Speech-To-Text that can be easily integrated, easy to use and can be flexible. I actually find out that in some cases my DeepSpeech spanish model can outperform Google and IBM Watson Speech-To-Text models in real situations with just 450 hours for train dev and test.

kevw1 · February 25, 2020, 10:22pm

We are Vivoce, a startup using DeepSpeech to detect pronunciation errors and help users improve their accents for language learning.

lukejohn602 · April 24, 2020, 2:20pm

I hope to use Deep Speech for African Languages soon in the future

lissyx · April 24, 2020, 2:29pm

What languages are you interested in?

dan0 · April 30, 2020, 4:50am

We use Mozilla DeepSpeech for voicemail transcription in FusionPBX via our DeepSpeech Frontend and some code we upstreamed into FusionPBX to add support for custom STT providers. Our users find transcriptions quite useful, with Mozilla DeepSpeech serving them with transcriptions since August 2018!

I would love to collaborate with @Gabriel_Guedj and others to build tuned models and deeper integrations in the telephony space. Feel free to reach out, I am @dan:whomst.online in #machinelearning:mozilla.org on Matrix.

Mozilla DeepSpeech is awesome, I really appreciate all the hard work @kdavis, @reuben, @lissyx, and other contributors have put in over the years to build this!

Kai · April 30, 2020, 1:39pm

A student at Dalarna University, Sweden, trying to use deepspeech to train a model for the Somali language

anarucu · May 16, 2020, 9:29am

Hi victornoriega7,
Could you share your deepspeech spanish model? I am quit far of getting so many hours of transcribed data in spanish to train my own spanish model.
Thanks
ana

Mircea_Moise · May 22, 2020, 10:46am

I am building an online video consultation tool for primary care (eg. GPs, family medicine) and using DeepSpeech to enable transcription of the meeting. I am currently working on integrating DeepSpeech with Jitsi (an open-source videoconferencing tool).

Potentially once the integration works, the transcription generate by DeepSpeech can be used to run ML algorithms over it to generate suggestions for doctor.

kdavis · May 22, 2020, 1:43pm

Maybe you could talk to @bernardohenz? He might have advice on adapting DeepSpeech to the medical domain.

Jeffrey_Canavan · May 27, 2020, 8:26pm

I am building a small personal companion that has the following features:

It does not require Wifi or internet connections.
It will be powered with less than 5 volts.
It will auto recharge with light.
If you talk to it, it will talk back to you intelligently with a voice.
The companion can talk to you an hour a day for 15 years, with no overlap.
The companion will have an extensive memory and will learn from you.
It will be no larger than an apple or a large deck of cards.
The per-unit cost will be less than $40 USD.

I am doing this because I can do it and I want to create something interesting.

Ekwav · May 30, 2020, 10:54am

I am trying to build a model that understands german dialects.

keoni · June 8, 2020, 11:38pm

Tēnā koutou kātoa!

Te Hiku Media is a Māori organization based in Aotearoa (New Zealand). Our purpose is to preserve and promote te reo Māori, the indigenous language of Aotearoa.

We’ve been using DeepSpeech since May 2018. We found it worked pretty well for te reo Māori, which is an oral language that was phonetically transcribed in the 19th century. We have an API running with a w.e.r. about 10%, and we use this API to help us speed up the transcription of native speaker (L1) recordings.

Deployment
We’re running DeepSpeech within a Docker on a p2.xlarge DeepLearning Ubuntu AMI in AWS. This is behind a Load Balancer and an Auto Scaling Group which allows us to bid for those old p2 instances at a relatively affordable cost (we spend about USD$1000/month to keep the API available 24/7). We use FastAPI to load and run DS in python. We’ve got a Django instance, koreromaori.io, between this API and what the end-user sees (there’s reasons for this) but we’re in the process of figuring out how to more efficiently deploy DeepSpeech. Keen to hear what others are doing.

Use
For many reasons, some of which you can learn about in this Greater Than Code podcast and this Te Pūtahi podcast, we’ve built our own Django based web-app to collect te reo Māori data, koreromaori.com [corpora]. We started this around the same time as the Common Voice project and because my experience was in Django, it made more sense for us to work on corpora. Of course in hindsight there are many more reasons why using your own platform for data collection can be useful. For example, all the data is available through an API which helps us when it comes time to train models. We also label data specifically to our context, such as whether a speaker is “native” (L1 vs. L2) or whether pronunciation or intonation is correct. Finally, for indigenous languages, it’s often more appropriate for the data to remain with the community rather than being put in the public domain.

Since we were able to train a DS model early on, we use this model to help us “machine review” data. We also built our own transcription tool, Kaituhi, to help us with transcribing our audio archives. It’s kind of like the BBC React Transcript Editor which I found out about AFTER we started work on Kaituhi. We use koreromaori.io to provide automated transcriptions for Kaituhi, and we’re hoping to add word level confidences to the transcriptions to speed up the review process (the confidences are in the DS api, they’re just not exposed yet in our API).

Kaimahi & Community
Here are some of our team also on this Discourse @utunga @mathematiguy - you may see them ask questions from time to time so please chime in!

The main reason why our initiative to build STT for a language that nearly went extinct was successful is because of the community around the language. We cannot forget the hard work done by so many during the 20th century to make te reo Māori (and other indigenous languages) a living language once again. I think Mozilla’s done a good job with building a community around Common Voice. If you’re someone working on language tools for non mainstream languages, building trust with the right community is critical to solving the data problem. Also understanding that there’s a level of respect and responsibility that comes with access to data is important.

Ngā Mihi
We wouldn’t be where we are today in terms of the technology if it wasn’t for DeepSpeech. So a big thank you to Mozilla and the DeepSpeech team and all of you who are an active part of the DS and common voice community!