New project: deepspeech websocket server & client

daanzu · October 13, 2018, 4:53am

DeepSpeech WebSocket Server

This is a WebSocket server (& client) for Mozilla’s DeepSpeech, to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.

Work in progress. Developed to quickly test new models running DeepSpeech in Windows Subsystem for Linux using microphone input from host Windows. Available to save others some time.

Features

Server
- Streams raw audio data from client via WebSocket
- Streaming inference via DeepSpeech v0.2+
- Single-user (issues with concurrent streams)
Client
- Streams raw audio data from microphone to server via WebSocket
- Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances

lissyx · October 13, 2018, 9:24am

Nice, just be ready, 0.3.0 is coming

alex_ploner · October 15, 2018, 11:43pm

Awesome
Might be a great way to explore possibilities/use deep seach for research projects on mobile devices until there are stable enough ports for Android and iOS.

nmstoker · November 22, 2018, 12:05am

Looks like a really good idea.

Have you had any success with the client on Linux at all?

I have run into various audio issues with PyAudio, on both my Arch Linux laptop and also on a Raspberry Pi (which has a Matrix Voice hat for the microphone). I can post more detail later (it’s late now!) but thought I’d check if anything like either environment had been successful for you?

You mention you were running on Windows host, so maybe it’s less fiddly there than I’m finding audio on Linux

daanzu · November 22, 2018, 3:39am

Thanks!

I admit my usage is for the client running on Windows, where pyaudio installed from binary wheels couldn’t be easier.

I haven’t used pyaudio on your 2 platforms, but it worked fine for me on Ubuntu 18.04 recently, once I installed the portaudio19-dev headers and added my user account to the audio group.

nmstoker · November 24, 2018, 11:40pm

Thanks @daanzu. I managed to get it working - the microphone wasn’t set up right in PulseAudio and once I got that right (plus figured out a small issue with my laptop’s firewall!) I managed to get it working between two computers, both running Arch Linux.

It looks like it’ll be v useful - thanks again for putting this great project out there

engineeraashish20 · December 16, 2018, 12:18pm

@daanzu: I am trying to make the setup for the server on Ubuntu. But when I tried running the command, I got the below error. Please advice.

/deepspeech-websocket-server$ python server.py --model …/models/daanzu-6h-512l-0001lr-425dr/ -l -t
Traceback (most recent call last):
File “server.py”, line 4, in
from bottle import get, run, template
ImportError: No module named bottle

The requirement is already installed, but I am getting the same error.

/deepspeech-websocket-server$ pip install bottle
Requirement already satisfied: bottle in /usr/local/lib/python3.5/dist-packages
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.

engineeraashish20 · December 16, 2018, 12:43pm

@daanzu: Also, I am facing installation issues of client on windows. I tried googling it but not much success.

I get the below error on running (pip install -r requirements-client.txt)

src/_portaudiomodule.c(29): fatal error C1083: Cannot open include file: ‘portaudio.h’: No such file or directory
error: command ‘C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe’ failed with exit status 2

daanzu · December 17, 2018, 3:09am

These appear to be general Python installation/configuration issues. On Ubuntu, python isn’t seeing the installed package; and on Windows, pip should be getting the binary wheel for pyaudio and not need to compile. Do other python scripts work? I’d suggest pursuing general python support resources.

engineeraashish20 · December 17, 2018, 12:31pm

@daanzu: Thanks for the reply. I am able to run the setup, with pre-trained model available at https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz

Do I need to use model/daanzu-6h-512l-0001lr-425dr for the setup?

My client is running on Windows and Server on Ubuntu. Can you please point where I need to make changes, so that client can send the input (.wav) to the server and server can send it back to client (text) (ip for client and server)?

I tried below command, but I don’t see any audio getting saved in C directory. Please refer the screenshot.

daanzu · December 18, 2018, 6:59am

The model/daanzu-6h-512l-0001lr-425dr is just my model directory for testing. Just pass any model directory to use, like the pre-trained model, and use the default names. Or pass each parameter/filename individually.

Currently, the client just listens to the microphone for audio. It would be easy to modify it to read wav files, though: just add wav-file reading code to consumer function. The protocol is dead simple, so it’d be easy to just write a new client, too.

Your command looks good; but the absolute Windows path might be getting parsed wrong. Try “.” for the current directory, or any relative path. It should show a spinner when it hears audio on the microphone, too.

engineeraashish20 · December 18, 2018, 7:43am

@daanzu: I also want real-time streaming of audios. So, I want the client listen to the microphone. I made the changes but somehow I still son’t see any spinner coming up when I speak. It’s dead. Any help?

What I understood till now, is the client will listen the microphone and will create the WAV file and this file will be routed to the server to text transformation. So, I believe I don’t have to make any changes to the client code.

laxmikant04.yadav · June 24, 2019, 11:28am

Hi ,
i get below error while running client file -

Connecting to ‘ws://localhost:8080/recognize’…
ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71
ALSA lib setup.c:547:(add_elem) Cannot obtain info for CTL elem (MIXER,‘IEC958 Playback Default’,0,0,0): No such file or directory
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm.c:2495:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline
ALSA lib pcm_dmix.c:1052:(snd_pcm_dmix_open) unable to open slave

Is this something related to my configuration ? Any help please.

Thanks!

daanzu · June 25, 2019, 12:03am

It uses pyaudio, which uses portaudio. Try searching for that, because the problem appears to be of general audio.

laxmikant04.yadav · June 26, 2019, 4:14am

thanks @daanzu , It seems to be my system’s microphone issue . Once i get that resolve , i will update the progress.

Thanks!

cryptoaimdy · October 22, 2019, 6:08am

Hi,
i am using this to deploy deepspeech on server. but after server starts nothing is shwoing in browser after reaching that port.

(deepspeech-train-venv) yk@andromeda:~/deepspeech-websocket-server$ python server.py --model /home/yk/nestle_project/DeepSpeech/data/model/
Initializing model...
2019-10-22 AM 11:37:55.708: __main__: INFO: <module>(): ARGS.model: /home/yk/nestle_project/DeepSpeech/data/model/output_graph.pb
2019-10-22 AM 11:37:55.708: __main__: INFO: <module>(): ARGS.alphabet: /home/yk/nestle_project/DeepSpeech/data/model/alphabet.txt
TensorFlow: v1.13.1-10-g3e0cc53
DeepSpeech: v0.5.1-0-g4b29b78
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2019-10-22 11:37:55.709031: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-22 11:37:55.808088: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "CPU"') for unknown op: UnwrapDatasetVariant
2019-10-22 11:37:55.808117: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: WrapDatasetVariant
2019-10-22 11:37:55.808123: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "WrapDatasetVariant" device_type: "CPU"') for unknown op: WrapDatasetVariant
2019-10-22 11:37:55.808214: E tensorflow/core/framework/op_kernel.cc:1325] OpKernel ('op: "UnwrapDatasetVariant" device_type: "GPU" host_memory_arg: "input_handle" host_memory_arg: "output_handle"') for unknown op: UnwrapDatasetVariant
Bottle v0.12.17 server starting up (using GeventWebSocketServer())...
Listening on http://127.0.0.1:8080/
Hit Ctrl-C to quit.

deep_learning · January 8, 2020, 11:09am

is it possible to use for model train with different language rather than English language. I am training network with other language and I want to use.

nmstoker · January 8, 2020, 1:52pm

Yes, that should work (although I haven’t tried this).

So long as you can get your model working on the server side then I’ll pretty sure it’ll be fine. That would be very similar to running regular inference on your model.

I don’t think the VAD functionality in the client would be impacted by use with other languages (from a quick Google I couldn’t see anything suggesting it was English only and it seems common sense that detecting voiced vs non-voiced wouldn’t be strongly influenced by the language being spoken)

Therefore I’d suggest giving it a go. Would be great if you could give feedback here on your progress so others know it works.

daanzu · January 8, 2020, 3:14pm

I haven’t tried it, but I think it should work. Feel free to let me know.

deep_learning · January 10, 2020, 10:58am

It is working for other language also. Thank you very much. But I have question about creation language model. I have just 10 words I want to create language model could you provide some way to create language model?