New project: deepspeech websocket server & client


#1

DeepSpeech WebSocket Server

This is a WebSocket server (& client) for Mozilla’s DeepSpeech, to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.

Work in progress. Developed to quickly test new models running DeepSpeech in Windows Subsystem for Linux using microphone input from host Windows. Available to save others some time.

Features

  • Server
    • Streams raw audio data from client via WebSocket
    • Streaming inference via DeepSpeech v0.2+
    • Single-user (issues with concurrent streams)
  • Client
    • Streams raw audio data from microphone to server via WebSocket
    • Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances

Running DeepSpeech inside Docker Container
(Lissyx) #2

Nice, just be ready, 0.3.0 is coming :slight_smile:


(Alexander Ploner) #3

Awesome :slight_smile:
Might be a great way to explore possibilities/use deep seach for research projects on mobile devices until there are stable enough ports for Android and iOS.


(Neil Stoker) #4

Looks like a really good idea.

Have you had any success with the client on Linux at all?

I have run into various audio issues with PyAudio, on both my Arch Linux laptop and also on a Raspberry Pi (which has a Matrix Voice hat for the microphone). I can post more detail later (it’s late now!) but thought I’d check if anything like either environment had been successful for you?

You mention you were running on Windows host, so maybe it’s less fiddly there than I’m finding audio on Linux :slightly_smiling_face:


#5

Thanks!

I admit my usage is for the client running on Windows, where pyaudio installed from binary wheels couldn’t be easier.

I haven’t used pyaudio on your 2 platforms, but it worked fine for me on Ubuntu 18.04 recently, once I installed the portaudio19-dev headers and added my user account to the audio group.


(Neil Stoker) #6

Thanks @daanzu. I managed to get it working - the microphone wasn’t set up right in PulseAudio and once I got that right (plus figured out a small issue with my laptop’s firewall!) I managed to get it working between two computers, both running Arch Linux.

It looks like it’ll be v useful - thanks again for putting this great project out there :slight_smile:


(Engineeraashish20) #7

@daanzu: I am trying to make the setup for the server on Ubuntu. But when I tried running the command, I got the below error. Please advice.

/deepspeech-websocket-server$ python server.py --model …/models/daanzu-6h-512l-0001lr-425dr/ -l -t
Traceback (most recent call last):
File “server.py”, line 4, in
from bottle import get, run, template
ImportError: No module named bottle

The requirement is already installed, but I am getting the same error.

/deepspeech-websocket-server$ pip install bottle
Requirement already satisfied: bottle in /usr/local/lib/python3.5/dist-packages
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.


(Engineeraashish20) #8

@daanzu: Also, I am facing installation issues of client on windows. I tried googling it but not much success.

I get the below error on running (pip install -r requirements-client.txt)

src/_portaudiomodule.c(29): fatal error C1083: Cannot open include file: ‘portaudio.h’: No such file or directory
error: command ‘C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe’ failed with exit status 2


#9

These appear to be general Python installation/configuration issues. On Ubuntu, python isn’t seeing the installed package; and on Windows, pip should be getting the binary wheel for pyaudio and not need to compile. Do other python scripts work? I’d suggest pursuing general python support resources.


(Engineeraashish20) #10

@daanzu: Thanks for the reply. I am able to run the setup, with pre-trained model available at https://github.com/mozilla/DeepSpeech/releases/download/v0.3.0/deepspeech-0.3.0-models.tar.gz

Do I need to use model/daanzu-6h-512l-0001lr-425dr for the setup?

My client is running on Windows and Server on Ubuntu. Can you please point where I need to make changes, so that client can send the input (.wav) to the server and server can send it back to client (text) (ip for client and server)?

I tried below command, but I don’t see any audio getting saved in C directory. Please refer the screenshot.


#11

The model/daanzu-6h-512l-0001lr-425dr is just my model directory for testing. Just pass any model directory to use, like the pre-trained model, and use the default names. Or pass each parameter/filename individually.

Currently, the client just listens to the microphone for audio. It would be easy to modify it to read wav files, though: just add wav-file reading code to consumer function. The protocol is dead simple, so it’d be easy to just write a new client, too.

Your command looks good; but the absolute Windows path might be getting parsed wrong. Try “.” for the current directory, or any relative path. It should show a spinner when it hears audio on the microphone, too.


(Engineeraashish20) #12

@daanzu: I also want real-time streaming of audios. So, I want the client listen to the microphone. I made the changes but somehow I still son’t see any spinner coming up when I speak. It’s dead. Any help?

What I understood till now, is the client will listen the microphone and will create the WAV file and this file will be routed to the server to text transformation. So, I believe I don’t have to make any changes to the client code.