Is the LSTM layer in Deep Speech v0.6 Unidirectional or Bidirectional?
According to Baidu’s paper, it is bidirectional, but it seems to be unidirectional in Mozilla’s Deep Speech.
Is that correct?
That’s correct, this is what we document on https://deepspeech.readthedocs.io/en/v0.6.0/DeepSpeech.html. Please send PR / issues if you think the wording of the documentation needs improvement.
Thank you for your fast response.
That leads to another question.
Why Deep Speech cannot generate output in real-time?
Why does it need to get the whole audio file to generates output?
DeepSpeech is capable of streaming and can generate output faster than real time with appropriate hardware. We have extensive documentation on the streaming API as well as several examples.
Would you mind directing me to this documentation? I’ve been searching around and am having trouble finding proper resources regarding streaming audio into DeepSpeech.
Have you had a look at the README ? https://github.com/mozilla/DeepSpeech#project-deepspeech It also contains links to readthedocs.