Uni-directional model for online/incremental ASR in real-time applications e.g. voice assistants

oplatek · April 3, 2018, 5:35am

Is there plan for uni-directional models?
Are pull request for uni-directional models welcome? What would be the requirements?

Motivation:
A left-to-right uni-directional models will allow ASR to be used in real-time applications
because they allow decoding speech without seeing the whole audio in advance so the latency is minimal. They decode as the user speaks.

Thank you

Oplatek

PS: Some related questions might be

RTF: One should need RF < 1.0 for real time ASR Inference time run speeds
Similar questions on architecture but not very helpful answers Any Good Architecture for continous Inference

kdavis · April 3, 2018, 8:02am

Yes, we are currently working on this in a branch[1].

jageshmaharjan · July 10, 2018, 3:53am

Keeping an eye on it