Client-side offline speech recognition

Hi,

Is there client-side speech recognition? A workaround that I have attempted is to write everything in nodeJS, bundle it with browserify and include it in my client side HTML. However, this has failed to work for reasons beyond my understanding.

I would appreciate it if anyone can offer me insight into this domain.
Also, my goal is to create an offline PWA with speech recognition, is this possible?

Thank you!

Have you had a look at our documentation ? https://deepspeech.readthedocs.io/ thereā€™s API for many languages.

1 Like

Yes I have. What is your take on my workaround though? Is this the approach I should be employing? Or are there other approaches to developing an offline web app with speech recognition?

[gritter97] gritter97 https://discourse.mozilla.org/u/gritter97
Ashwin selvakumar
March 19

Yes I have. What is your take on my workaround though? Is this the
approach I should be employing? Or are there other approaches to
developing an offline web app with speech recognition?

Iā€™m sorry but you donā€™t explain anything, so I donā€™t know what your
workaround is.

In my first post, I said that I wrote everything in nodejs and bundled it with browserify to include it in my client-side HTML. However, it did not work and I owe that to my lack of knowledge with deepspeech. Hence, I would like to clarify, am I headed down the right path? How do I include it in my client-side JS?

I understand that I can stream audio from my client to my nodejs server, but I do not want that. I want it to work completely through the browser.

I have no idea what that means, what this produced.

itā€™s non specific, was it an error? something else?

no, you would need tensorflow.js somehow but our model does not work with that to the best of our knowledges.

Browserify lets you require() node modules in your client-side JS. I used it as the deepspeech API gave documentation for nodeJS and none for client-side JS.

This was the error that I got in my developer console: ā€œUncaught TypeError: Cannot read property ā€˜_handleā€™ of undefinedā€

I have searched it up and canā€™t seem to find resources that explains what this means.

Okay, are you implying that I would have to train my own model to achieve this? And also, I can then use this model to work with deepspeech?

Please search on github issues, there was already a thread about tensorflow.js

Also, to go along with questions that I have just asked. Essentially, I am trying to get complete offline speech recognition(in the browser) for a very limited set of vocabulary. My goal is to create a PWA that does exactly that. My knowledge on ML is very basic, could you point me to the right direction(ie. resources, tech to research on etc.) as to how I can achieve this?

I really appreciate you taking your time to respond by the way.

@gritter97 This seems like a clear and useful question. Itā€™s a shame there doesnā€™t appear to be people interested in helping. Good luck. Iā€™m on a similar mission. Iā€™m just starting down the road, and was wondering if I could use something like Web Assembly to accomplish it. Iā€™d be surprised if it works, but a large ā€œinstallā€ would be acceptable for my use case. Iā€™ll follow up here if I make any headway.

@gritter97 looks like there is a Web API! SpeechRecognition, unfortunately it has limited browser support.

https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition

Hi @jancarius, I appreciate the resource. I wonder how it slipped from right under our noses! How is your progress? Did you ever get into Web Assembly? Seems like very interesting stuff, I wonder what its capabilities are. I suggest you look into tensorflow.js , I followed this tutorial: https://codelabs.developers.google.com/codelabs/tensorflowjs-audio-codelab/index.html#0 and am attempting to create a PWA that caches the included resources. The next step would be to use other models that have a larger vocabulary.

I will be sure to update this thread on my progress.

Hey guys @jancarius @bozden, I have managed to get the TensorFlow tutorial working as a PWA. Here is a link to the repo: https://github.com/Ashwin2397/Offline_STT . I have made it as a proof-of-concept and am looking to develop a similar rendition with full English speech recognition. This implementation is based on the limited vocabulary provided by TensorFlowā€™s model.

Based on my limited understanding, I have gathered that I would be able to produce full speech recognition by using data provided by the common voice project and training my own model. I shall not go into this realm until I have fully explored Deepspeech. Hence, my next course of action is to do the following:

  1. Use deepspeech in nodejs
  2. Create a bundle with webpack
  3. Deploy it as a PWA

As aforementioned, bundling the nodeJS script with browserify did not work. Thus, I am unsure if using webpack will be any different. I am amidst learning webpack now.

To those whom are more well-versed and experienced, I would appreciate any information or advise, thank you!

1 Like