DeepSpeech over Web Browser

agarwalaashish20 · January 30, 2019, 11:27pm

Hi, I have created a project to use DeepSpeech on the web browser to ease the use. Till now I haven’t found any relevant project in the example section of DeepSpeech (DeepSpeech/examples at master · mozilla/DeepSpeech · GitHub) that talks about accessing the Mozilla DeepSpeech on the Web.

I thought of sharing it. Please have a look, if the project can be included in the DeepProject.

Any suggestions and improvements are welcomed.

lissyx · January 30, 2019, 11:45pm

That’s nice! However, quickly looking at the code, it’s an API using deepspeech and client code to call it, right? And since the API is written in Python, why bother with inefficient subprocess calls that will cause the model to be reloaded from scratch each time when you could directly use the python module, and write from deepspeech import Model ?

agarwalaashish20 · February 6, 2019, 6:43pm

Thank you for the input @lissyx.

As suggested, I have made the necessary changes in the backend code. But strangely, the results are not as accurate as before. I believe it has to do with the BEAM_WIDTH, LM_WEIGHT etc that are required to define the Deep Speech model. Please guide.

Here is the link: DeepSpeech-API

Also, my intent for this discussion is to have the above repository forked as a part of Mozilla DeepSpeech examples folder. DeepSpeech Examples

Reason: We want to introduce Mozilla DeepSpeech model to the students in the University. Since we don’t want the students to go through the entire setup, we want to run the model on a standalone server that runs DeepSpeech and students can use it over the browser. And later, if they find it interesting, they can involve themselves. A model over the browser makes things really easy.

Looking forward to your answer.

agarwalaashish20 · February 11, 2019, 7:26pm

Any comments on the above request? Looking forward to your answer.

carlfm01 · February 12, 2019, 3:28am

I think there’s no viable server-client yet, as mentioned here the model can’t take more than one audio at time, @agarwalaashish20 multiple clients feeding the same deepspeech server instance can cause the things you said.

You can read about batching here
https://github.com/tensorflow/serving/tree/master/tensorflow_serving/batching

lissyx · February 12, 2019, 9:34am

Please refer to the release notes and the other client to see the proper values, it’s all there, no magic.

agarwalaashish20 · March 2, 2019, 7:19pm

@lissyx @carlfm01: Please have a look. I did some fixes. Are these changes viable for a fork?

Here is the link: DeepSpeech-API

lissyx · March 4, 2019, 2:19pm

That looks better, but I’m really really not convinced this should land in the main repo.

agarwalaashish20 · March 5, 2019, 1:04pm

@lissyx: Please guide, if you have a some ideas to improve it. This implementation serves purpose of accessing DeepSpeech over the web browser. But I am open to enhance it. I just need your guidance, so that we can end up with an implementation that is sufficient to help users to quickly start using DeepSpeech over the browser.

lissyx · March 5, 2019, 1:06pm

I’m not sure exactly what you mean, that’s not a DOM implementation of DeepSpeech, that an API exposed over HTTP, there are others like https://pypi.org/project/deepspeech-server/ or in Rust: Files · master · deepspeech / ds-srv · GitLab

Topdog_Mechanic · November 23, 2023, 2:10pm

good day good sir could any of your examples mentioned be used to control ones website using their voice on the frontend for users to browse your content if so could you please provide me the information and files to do so. thank you much for all your hard work