Hi, I want to build a common speech-to-text server

radamar · March 20, 2018, 4:47pm

Little introduction, I am a student participating in Summer of code, I am an avid gamer and I love to customize Firefox with userChrome.css, I code in Java and Haskell.

Here is a cool link for a variety of Firefox mods: https://github.com/Timvde/UserChrome-Tweaks
In my quest for the most minimal desktop interface I hit a roadblock though. There is hardly any application for speech recognition on Linux, even though there is rich applications for MacOs like Dragon dictate which can be hacked to control a browser : https://youtu.be/YalmPQEP54g

I think that speech input is about as fast as a keyboard and as easy to use as a mouse. And, there are pretty good models like
CMU Sphinx : https://github.com/cmusphinx/sphinx4
Kaldi GitHub: kaldi-asr/kaldi
And of course, Mozilla DeepSpeech on github: mozilla/DeepSpeech

It’s pretty diverse range of speech-to-text engines. Now, can we make applications take advantage of these successful models? I think Yes! Can we make a server, like Apache web server to serve transcriptions? An application will be able to send a request to the server to record audio then convert it to text and send it back to the application. Cool. Can this be done Mozilla?

Also see:
Pretty cool demo of Elite dangerous: https://youtu.be/DRVCkUN_Mq8
Signal programming in Linux : https://www.freedesktop.org/wiki/IntroductionToDBus/

I can’t wait to finish my minimal Firefox setup, with Voice control. But the cool part about implementing with Dbus is that other applications can talk to it too! GAMES, INTUITIVE UI? Idk. Possibilities are endless.

mhenretty · March 21, 2018, 7:30pm

Great ideas, and thanks for your thoughts @radamar.

As this time, Mozilla has no intention to host a cloud based Speech-to-Text engine. That said, there is nothing stopping you from taking our open source toolkit (DeepSpeech) and it’s open models, and creating your own cloud API. In fact, that is exactly what our tools are meant for.

If you do so, please update this thread so people can follow along with your progress.

Thanks!
Michael

radamar · March 22, 2018, 12:26am

Thanks, there is a slight correction I want to make. The server doesn’t have to run on the cloud! For example, I use Gradle build tool, which starts a server/daemon on my machine which can build applications and also show compile issues in a web page. All locally on my machine. I was thinking of it as a Linux service that anyone can run on his computer. And building a Dbus interface to talk to the server?

Cheers

Franck_Dernoncourt · April 6, 2018, 10:45pm

There is hardly any application for speech recognition on Linux

I agree. If interested, I listed possible ASR solutions for Linux: Is there any decent speech recognition software for Linux? (summary: there are no good solutions).

radamar · April 7, 2018, 12:57am

That information is helpful. I was having some fun yesterday with ‘espeak-ng’ and ‘MARY’ TTS, for example entering song lyrics into these engines with hilarious results.
Thanks Franck_Dernoncourt.