Mozilla Voice STT in the Wild!

dewi.jones · January 29, 2020, 12:20pm

At Bangor University we’re using DeepSpeech (and Welsh CommonVoice) within a Welsh language digital assistant project. Based on Flutter, our app for Android and iOS can respond to simple questions regarding weather, news, time, Welsh language wikipedia and Welsh language music on Spotify, thanks to a hosted DeepSpeech server.

We’re also evaluating DeepSpeech’s recent work on transfer learning for larger domains such as dictation and captioning. Results so far have been very exciting for a lesser resourced language like Welsh. It would be awesome to see transfer learning supported in main releases of DeepSpeech.

Thank you so much Mozilla for DeepSpeech!!

dsteinman · January 29, 2020, 6:04pm

My name is Dan, and I’m working on a voice + motion control system called Jaxcore. I’m using DeepSpeech to add speech recognition to control computers / home theaters / and smart home devices.

The upcoming open source desktop app will have DeepSpeech built-in and voice commands for doing things like controlling your computer mouse, typing on your computer, controlling media players.

You’ll be able to write web games that have speech recognition (using only client-side Javascript):

The speech recognition libraries are all modular and can be used individually:

baconator · January 31, 2020, 6:36am

I use Deepspeech as a local STT engine for mycroft.ai. It’s called via the deepspeech-server tool, so I can have multiple devices access it. Runs on a desktop cpu quite nicely. The audio is saved from mycroft, which I can use for fine tuning down the road sometime.

There’s a fine-tuned model and I use some filtering to help. Accuracy is good, latency is good.

lissyx · January 31, 2020, 10:42am

This one? https://github.com/MainRo/deepspeech-server

baconator · January 31, 2020, 12:57pm

That’s the one. Used that for a while now.

lissyx · February 4, 2020, 4:10pm

bernardohenz · February 6, 2020, 6:18pm

Our company Iara Health provides a system to aid radiologists in the writing of medical reports in the portuguese (BR) language. All our system is built over DeepSpeech, running locally on the user’s computer.

In the video above, you can see our portal being able to recognize commands (like loading a template), and handle punctuation, acronym and abbreviations. Our system eases the work of radiologists, making them produce more in less times.

We want to thank Mozilla for DeepSpeech.

anas9011 · February 16, 2020, 3:34am

We’re using DeepSpeech in tarteel.io to recognize Quran recitation and correct people’s mistakes on a word-by-word level!
The Quran is the Muslim’s holy book and we are instructed to recite it with "Tarteel, which translates best to “slow measured rhythmic tones”.
Muslims who try to memorize the Quran sometimes struggle to find an instructor someone to correct them. The Tarteel platform provides a “Quran Companion” they can use to recite to when they don’t have any to correct them.

elpimous_robot · February 22, 2020, 7:00pm

Hello.
I’m French, autodidact…

I’m a robotician (passion only), and I work on social interactions between robots and humains.
Vocal interactions are imperatives…
Thanks to Deepspeech.

I made a tuto to help each other create it own model.

victornoriega7 · February 24, 2020, 6:51pm

I’m a computer scientist student intrigued and interested by data science, who want to deploy a spanish model Speech-To-Text that can be easily integrated, easy to use and can be flexible. I actually find out that in some cases my DeepSpeech spanish model can outperform Google and IBM Watson Speech-To-Text models in real situations with just 450 hours for train dev and test.

kevw1 · February 25, 2020, 10:22pm

We are Vivoce, a startup using DeepSpeech to detect pronunciation errors and help users improve their accents for language learning.

lukejohn602 · April 24, 2020, 2:20pm

I hope to use Deep Speech for African Languages soon in the future

lissyx · April 24, 2020, 2:29pm

What languages are you interested in?

dan0 · April 30, 2020, 4:50am

We use Mozilla DeepSpeech for voicemail transcription in FusionPBX via our DeepSpeech Frontend and some code we upstreamed into FusionPBX to add support for custom STT providers. Our users find transcriptions quite useful, with Mozilla DeepSpeech serving them with transcriptions since August 2018!

I would love to collaborate with @Gabriel_Guedj and others to build tuned models and deeper integrations in the telephony space. Feel free to reach out, I am @dan:whomst.online in #machinelearning:mozilla.org on Matrix.

Mozilla DeepSpeech is awesome, I really appreciate all the hard work @kdavis, @reuben, @lissyx, and other contributors have put in over the years to build this!

Kai · April 30, 2020, 1:39pm

A student at Dalarna University, Sweden, trying to use deepspeech to train a model for the Somali language

anarucu · May 16, 2020, 9:29am

Hi victornoriega7,
Could you share your deepspeech spanish model? I am quit far of getting so many hours of transcribed data in spanish to train my own spanish model.
Thanks
ana

Mircea_Moise · May 22, 2020, 10:46am

I am building an online video consultation tool for primary care (eg. GPs, family medicine) and using DeepSpeech to enable transcription of the meeting. I am currently working on integrating DeepSpeech with Jitsi (an open-source videoconferencing tool).

Potentially once the integration works, the transcription generate by DeepSpeech can be used to run ML algorithms over it to generate suggestions for doctor.

kdavis · May 22, 2020, 1:43pm

Maybe you could talk to @bernardohenz? He might have advice on adapting DeepSpeech to the medical domain.

Jeffrey_Canavan · May 27, 2020, 8:26pm

I am building a small personal companion that has the following features:

It does not require Wifi or internet connections.
It will be powered with less than 5 volts.
It will auto recharge with light.
If you talk to it, it will talk back to you intelligently with a voice.
The companion can talk to you an hour a day for 15 years, with no overlap.
The companion will have an extensive memory and will learn from you.
It will be no larger than an apple or a large deck of cards.
The per-unit cost will be less than $40 USD.

I am doing this because I can do it and I want to create something interesting.

Ekwav · May 30, 2020, 10:54am

I am trying to build a model that understands german dialects.