Deep speech Uni/Bidirectional LSTM?

Hi
Is the LSTM layer in Deep Speech v0.6 Unidirectional or Bidirectional?
According to Baidu’s paper, it is bidirectional, but it seems to be unidirectional in Mozilla’s Deep Speech.
Is that correct?

That’s correct, this is what we document on https://deepspeech.readthedocs.io/en/v0.6.0/DeepSpeech.html. Please send PR / issues if you think the wording of the documentation needs improvement.

1 Like

Thank you for your fast response.

That’s correct

That leads to another question.
Why Deep Speech cannot generate output in real-time?
Why does it need to get the whole audio file to generates output?

DeepSpeech is capable of streaming and can generate output faster than real time with appropriate hardware. We have extensive documentation on the streaming API as well as several examples.

1 Like

Would you mind directing me to this documentation? I’ve been searching around and am having trouble finding proper resources regarding streaming audio into DeepSpeech.

Thanks

Have you had a look at the README ? https://github.com/mozilla/DeepSpeech#project-deepspeech It also contains links to readthedocs.

Hello !
Thanks for this interesting replies.
Is that means that Deepspeech uses a Unidirectional RNN when it comes to streaming mode and Bi-directional when it takes as input the whole audio file ?

And if it uses the Uni-directional RNN for both modes, why the streaming mode is faster the default mode.

Thanks

The neural net always takes in small time slots (20ms) as input. The difference between classic and streaming is just whether you slice up everything at the end or to do so continuously. No difference in the net. You have a difference in the scorer though as the result changes with more and more words, because the scorer can give a better predictions with more input. But they yield the same result at the end.

It’s very clear, Thanks a lot !