DeepSpeech WebSocket Server
This is a WebSocket server (& client) for Mozilla’s DeepSpeech, to allow easy real-time speech recognition, using a separate client & server that can be run in different environments, either locally or remotely.
Work in progress. Developed to quickly test new models running DeepSpeech in Windows Subsystem for Linux using microphone input from host Windows. Available to save others some time.
Features
- Server
- Streams raw audio data from client via WebSocket
- Streaming inference via DeepSpeech v0.2+
- Single-user (issues with concurrent streams)
- Client
- Streams raw audio data from microphone to server via WebSocket
- Voice activity detection (VAD) to ignore noise and segment microphone input into separate utterances