Could I do that with deepspeech?

Nicolas-san · January 11, 2021, 7:21pm

I want to get and audio input and print on the screen

Could I do that with deepspeech ?

I am a dev, I can write python even if it’s not my primary language, but I’m new en deep learning and STT

At the end I want do to it with a raspberry, but I have trouble to add a good audio jack sound input, since the rapsberry does not have it builtin, but I’s a hardware question, and I can go to odroid or other like this if I have something working.

I will do my test on my computer (ubuntu), with the standard audio mic input

Thanks

ps: if I cannot, what can I use ?

Have a really nice day

baconator · January 12, 2021, 5:53am

Yes. The examples for Deepspeech show a way to do this, you can customize on your own with python as well.
https://deepspeech.readthedocs.io/en/latest/USING.html#using-the-command-line-client

lissyx · January 12, 2021, 8:30am

Just take a look at https://github.com/mozilla/DeepSpeech/blob/master/native_client/python/client.py to re-implement this as you like.

There are bindings in other languages, so you can write in C, C++, JS, .Net, …

Nicolas-san · February 21, 2021, 11:05pm

thanks for your response, and sorry for the long delay

I have tested with the nodejs from mic test founded in the deepspeech-example repo on gituhb.

With the french model from commonvoice-fr (and thanks by the way )

But, I think I was mistaken in the way I want to use it, my goal is to use it as real time transcripter for video calls (with jitsi for example), so I guess I have to figure out how to listen the audio before it get to the output device.

I have been unable to run the py mic example, but, perhaps my python settings zas not good, I use pycharm, and it’s python3.8, I have not spend much time to debug, my first try is to figure out if the model is accurate enough to use it for real life conversation

othiele · February 22, 2021, 12:34pm

Have a look at streaming examples.

Nicolas-san · February 22, 2021, 1:35pm

can you point a specific example ?
Because what i’m using first was the mic streaming
but perhaps, it’s more how to acquire the output sound that I need to search on

othiele · February 22, 2021, 1:47pm

I have no idea what you are talking about, sorry.

DeepSpeech takes audio streaming input and produces textual output. Looks like you would want to grab jitsi’s output first and then feed it to DS. But how to do that is up to you.

Nicolas-san · February 22, 2021, 1:56pm

yes, you’re correct

I have to work on this part,
thanks

eric · May 12, 2021, 3:48am

Nicolas-san, did you manage to make it work on Jitsi?

Nicolas-san · May 6, 2022, 1:33pm

almost one year to respond … sorry, but no, I don’t have pursue this project