New project: deepspeech websocket server & client


#1

DeepSpeech WebSocket Server

This is a WebSocket server (& client) for Mozilla’s DeepSpeech, to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.

Work in progress. Developed to quickly test new models running DeepSpeech in Windows Subsystem for Linux using microphone input from host Windows. Available to save others some time.

Features

  • Server
    • Streams raw audio data from client via WebSocket
    • Streaming inference via DeepSpeech v0.2+
    • Single-user (issues with concurrent streams)
  • Client
    • Streams raw audio data from microphone to server via WebSocket
    • Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances

Running DeepSpeech inside Docker Container
(Lissyx) #2

Nice, just be ready, 0.3.0 is coming :slight_smile:


(Alexander Ploner) #3

Awesome :slight_smile:
Might be a great way to explore possibilities/use deep seach for research projects on mobile devices until there are stable enough ports for Android and iOS.


(Neil Stoker) #4

Looks like a really good idea.

Have you had any success with the client on Linux at all?

I have run into various audio issues with PyAudio, on both my Arch Linux laptop and also on a Raspberry Pi (which has a Matrix Voice hat for the microphone). I can post more detail later (it’s late now!) but thought I’d check if anything like either environment had been successful for you?

You mention you were running on Windows host, so maybe it’s less fiddly there than I’m finding audio on Linux :slightly_smiling_face:


#5

Thanks!

I admit my usage is for the client running on Windows, where pyaudio installed from binary wheels couldn’t be easier.

I haven’t used pyaudio on your 2 platforms, but it worked fine for me on Ubuntu 18.04 recently, once I installed the portaudio19-dev headers and added my user account to the audio group.


(Neil Stoker) #6

Thanks @daanzu. I managed to get it working - the microphone wasn’t set up right in PulseAudio and once I got that right (plus figured out a small issue with my laptop’s firewall!) I managed to get it working between two computers, both running Arch Linux.

It looks like it’ll be v useful - thanks again for putting this great project out there :slight_smile:


(Engineeraashish20) #7

@daanzu: I am trying to make the setup for the server on Ubuntu. But when I tried running the command, I got the below error. Please advice.

/deepspeech-websocket-server$ python server.py --model …/models/daanzu-6h-512l-0001lr-425dr/ -l -t
Traceback (most recent call last):
File “server.py”, line 4, in
from bottle import get, run, template
ImportError: No module named bottle

The requirement is already installed, but I am getting the same error.

/deepspeech-websocket-server$ pip install bottle
Requirement already satisfied: bottle in /usr/local/lib/python3.5/dist-packages
You are using pip version 9.0.1, however version 18.1 is available.
You should consider upgrading via the ‘pip install --upgrade pip’ command.


(Engineeraashish20) #8

@daanzu: Also, I am facing installation issues of client on windows. I tried googling it but not much success.

I get the below error on running (pip install -r requirements-client.txt)

src/_portaudiomodule.c(29): fatal error C1083: Cannot open include file: ‘portaudio.h’: No such file or directory
error: command ‘C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.16.27023\bin\HostX86\x64\cl.exe’ failed with exit status 2


#9

These appear to be general Python installation/configuration issues. On Ubuntu, python isn’t seeing the installed package; and on Windows, pip should be getting the binary wheel for pyaudio and not need to compile. Do other python scripts work? I’d suggest pursuing general python support resources.