DeepSpeech Node.js User Transcription Project

I am interested in creating a node.js application that takes user input and transcribes it. The end goal is to create an application that is completely voice controlled. I was researching utilities for this and I support and appreciate deepspeech by mozilla.

A Few Questions

  1. I installed deepspeech using npm install deepspeech, is this all I need?
  2. What is the optimal format and way to receive user audio from the client side- js?
    3.On the Server Side whats the best way to recieve the data how would I transcribe the audio and post it back to the user?

This is pretty much the basic functionality for this project that I would like to test out, A homepage that you speak into and get back the transcribed words using deepspeech, any references to helpful tutorials and documentation would be greatly appreciated.

Yes. You can have a look at the API: https://deepspeech.readthedocs.io/en/v0.6.0/NodeJS-API.html

As documented, it depends on the trained model. But it all defaults to WAV PCM 16 bits, 16 kHz.

Actually, you should be able to do everything client side.

If you really want to do a client/server architecture, then the client/server communication is really up to you, and you just have to use the deepspeech bindings server-side.

1 Like

Alright so I would follow the methods in the document, also just to be correct. Deepspeech can be used as a client side NLP, so instead of having to go back to the server and translate. The STT can be done client side?.
Just a few more questions?

  1. Do I need any other dependencies besides npm install deepspeech?
  2. I do not know much about training models would I install the common voice english mode on the client side? I would place within the node.js library?

Yes

The minimum should be pulled by deepspeech’s declared dependencies.

You don’t have. Please read the documentation crash-course on using the pretrained english model is already documented.

1 Like

We are talking about a native client here, right? OP didn’t specify what kind of application he is developing. For Web-Applications client-side only isn’t possible yet, right?

1 Like

That’s right. Without more context, we can’t know.

1 Like

Yes a JavaScript application using node js

Alright, in this case you need the node server. DeepSpeech can’t run client side in the browser yet. A good way to stream the audio to the server are Websockets. If its a fixed length recording then you can use XHR to send the blob/buffer. To retrieve the audio in the browser you could for example use “Media Capture and Streams API” and “MediaRecording API”

It’s also working quite well and is easy to write with the streaming part of the API.

And socket.io is a possible winner-take-all communication protocol, on top pf websockets, enabling developer to manage pull-mode (client-server request-reply) and push-mode (unsolicited notification= audio messages) from server to clients.