Continuous streaming without voice activity detection?

I’ve been passively playing around with DeepSpeech and have gotten some simple examples working. What I would like is ‘real time’ speech-to-text conversion, preferably in an agnostic way (via a wav stream, say) so that I can pump whatever input I want through DeepSpeech to convert on the fly.

I saw @reuben’s response on a HN thread about continuous streaming and output without any voice activity detection and was hoping to get more information on how to go about doing this.

I have successfully run the vad_transcriber by doing the following (I’ll be verbose in case anyone else comes along and wants to get the example running):

pip3 install -r requirements
python3 audioTranscript_cmd.py --model ./models/ --audio audio/2830-3980-0043.wav 

Where the models directory is a symbolic link to the deepspeech-0.6.0-models models directory.

I have not been able to get the ffmpeg_vad_streaming or mic_vad_streaming examples working.

The ffmeg_vad_streaming streaming example gives me the following error:

$ ffmpeg -version | head -n1
ffmpeg version 4.1.3-0york1~16.04 Copyright (c) 2000-2019 the FFmpeg developers
$ npm install
...
$ node ./index.js --model ./models/output_graph.pbmm --audio audio/2830-3980-0043.wav  --lm models/lm.binary --trie ./models/trie 
node: symbol lookup error: /home/abe/git/github/mozilla/DeepSpeech/examples/ffmpeg_vad_streaming/node_modules/deepspeech/lib/binding/v0.6.0/linux-x64/node-v64/deepspeech.node: undefined symbol: _ZNK2v86String10Utf8LengthEPNS_7IsolateE

Where, as above, the models directory is a symlink to the deepspeech-0.6.0-models directory.

I have a hard time figuring how to use mic_vad_streaming properly.

Sorry for the long message, I’m unfamiliar with DeepSpeech, Tensorflow and don’t have deep knowledge of Python. Any help would be appreciated in pointing me in the right direction.

1 Like

Likely a node installation from a distribution, we only test and support NodeJS’s distributed binaries.

Thanks, I can now get the nodejs_wav examples working at least.