DeepSpeech Node.js User Transcription Project

aadeojo · December 10, 2019, 7:43pm

I am interested in creating a node.js application that takes user input and transcribes it. The end goal is to create an application that is completely voice controlled. I was researching utilities for this and I support and appreciate deepspeech by mozilla.

A Few Questions

I installed deepspeech using npm install deepspeech, is this all I need?
What is the optimal format and way to receive user audio from the client side- js?
3.On the Server Side whats the best way to recieve the data how would I transcribe the audio and post it back to the user?

This is pretty much the basic functionality for this project that I would like to test out, A homepage that you speak into and get back the transcribed words using deepspeech, any references to helpful tutorials and documentation would be greatly appreciated.

lissyx · December 11, 2019, 6:46am

Yes. You can have a look at the API: https://deepspeech.readthedocs.io/en/v0.6.0/NodeJS-API.html

As documented, it depends on the trained model. But it all defaults to WAV PCM 16 bits, 16 kHz.

Actually, you should be able to do everything client side.

If you really want to do a client/server architecture, then the client/server communication is really up to you, and you just have to use the deepspeech bindings server-side.

aadeojo · December 11, 2019, 7:17pm

Alright so I would follow the methods in the document, also just to be correct. Deepspeech can be used as a client side NLP, so instead of having to go back to the server and translate. The STT can be done client side?.
Just a few more questions?

Do I need any other dependencies besides npm install deepspeech?
I do not know much about training models would I install the common voice english mode on the client side? I would place within the node.js library?

lissyx · December 11, 2019, 7:50pm

Yes

The minimum should be pulled by deepspeech’s declared dependencies.

You don’t have. Please read the documentation crash-course on using the pretrained english model is already documented.

Johannes_Beiser · December 16, 2019, 12:36pm

We are talking about a native client here, right? OP didn’t specify what kind of application he is developing. For Web-Applications client-side only isn’t possible yet, right?

lissyx · December 16, 2019, 12:40pm

That’s right. Without more context, we can’t know.

aadeojo · December 17, 2019, 11:51pm

Yes a JavaScript application using node js

Johannes_Beiser · December 18, 2019, 2:41pm

Alright, in this case you need the node server. DeepSpeech can’t run client side in the browser yet. A good way to stream the audio to the server are Websockets. If its a fixed length recording then you can use XHR to send the blob/buffer. To retrieve the audio in the browser you could for example use “Media Capture and Streams API” and “MediaRecording API”

lissyx · December 18, 2019, 3:47pm

It’s also working quite well and is easy to write with the streaming part of the API.

solyarisoftware · January 27, 2021, 4:46pm

And socket.io is a possible winner-take-all communication protocol, on top pf websockets, enabling developer to manage pull-mode (client-server request-reply) and push-mode (unsolicited notification= audio messages) from server to clients.