Could I do that with deepspeech?

I want to get and audio input and print on the screen

Could I do that with deepspeech ?

I am a dev, I can write python even if it’s not my primary language, but I’m new en deep learning and STT

At the end I want do to it with a raspberry, but I have trouble to add a good audio jack sound input, since the rapsberry does not have it builtin, but I’s a hardware question, and I can go to odroid or other like this if I have something working.

I will do my test on my computer (ubuntu), with the standard audio mic input

Thanks

ps: if I cannot, what can I use ?

Have a really nice day :slight_smile:

Yes. The examples for Deepspeech show a way to do this, you can customize on your own with python as well.
https://deepspeech.readthedocs.io/en/latest/USING.html#using-the-command-line-client

1 Like

Just take a look at https://github.com/mozilla/DeepSpeech/blob/master/native_client/python/client.py to re-implement this as you like.

There are bindings in other languages, so you can write in C, C++, JS, .Net, …

1 Like

thanks for your response, and sorry for the long delay :wink:

I have tested with the nodejs from mic test founded in the deepspeech-example repo on gituhb.

With the french model from commonvoice-fr (and thanks by the way :slight_smile: )

But, I think I was mistaken in the way I want to use it, my goal is to use it as real time transcripter for video calls (with jitsi for example), so I guess I have to figure out how to listen the audio before it get to the output device.

I have been unable to run the py mic example, but, perhaps my python settings zas not good, I use pycharm, and it’s python3.8, I have not spend much time to debug, my first try is to figure out if the model is accurate enough to use it for real life conversation

Have a look at streaming examples.

can you point a specific example ?
Because what i’m using first was the mic streaming
but perhaps, it’s more how to acquire the output sound that I need to search on

I have no idea what you are talking about, sorry.

DeepSpeech takes audio streaming input and produces textual output. Looks like you would want to grab jitsi’s output first and then feed it to DS. But how to do that is up to you.

yes, you’re correct :slight_smile:

I have to work on this part,
thanks

Nicolas-san, did you manage to make it work on Jitsi?

almost one year to respond … sorry, but no, I don’t have pursue this project