Referring to https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/.
The performance gain in terms of time and resource consumption is mentioned in the blog.
I was wondering, In terms of quality of STT, was the Bi-directional LSTM more accurate? or, since you are using only a very small window of audio to infer the text(for streaming purpose), does the backward context(which the bidirectional LSTM has but not the unidirectional) not matter as much?