Conditionals

Hi. I am new to DeepSpeech and had two questions:

  1. Is it possible to use DeepSpeech as wakeword / hotword? In other words, when I say X word I begin to recognize my voice.

  2. Is it possible to put conditionals in mic_vad_streaming.py?
    https://github.com/mozilla/DeepSpeech-examples/tree/r0.8/mic_vad_streaming
    That is, like a voice assistant, if you say X thing do X thing. Any advice on what to do?

Greetings.

Yes, but it looks like you are thinking of sth like a voice assistant. Have a look at Jaco which uses DeepSpeech for an assistant. If you want to build it yourself, there is a new hot word feature, but it will take a bit to be in a release stage.

Look at sth like Jovo if you don’t want to use Jaco.

1 Like

There are actually other wizards like Rhasspy that use DeepSpeech, but I prefer to do it myself, so the code is optimized for exactly what I want. My current problem is the wakeword and that the VAD does not work, that is, start recording immediately listen to the wakeword.

If you have any advice I will value it.

Here are my 2 cents on that. DeepSpeech is quite slow on CPUs or Raspies, so you would have to record almost all of the time to get the first word or two from the user. An approach that doesn’t use such a big neural net might be more suited. I guess there is sth from Picovoice and some others. Maybe it is best to split this task.

But if you can, try the new hotword feature in the master branch. Maybe with a tflite model and small scorer it can be quite effective. It is still quite new.

1 Like

Regarding the keyword I am using this: GitHub - snipsco/snips-record-personal-hotword
Apparently it works well.

Regarding the other thing you said about STT in CPU, I use Raspberry Pi 4 (4GB), you tell me that it is slow?
I see it takes about 1.37 seconds on my Raspberry Pi, I would dare to say that on Rhasspy (GitHub - synesthesiam/rhasspy: Rhasspy voice assistant for offline home automation) it takes up to 3 seconds (I did not calculate the time in Rhasspy , it is simply a mental estimate). Do you think that in Rhasspy it is slower because they use version 0.6 or it does not affect directly?

By the way, thanks for your reply.

Yes, we now can use multiple threads on TFLite, so it’s much faster than those releases.

1 Like

For future reference, also try Mycroft’s precise.

1 Like

Please also note that processing time depends on how much audio you process.

1 Like