Is it possible to extract words as the model Stream is running?

I’m currently using the streaming code from this article written by Reuben (https://hacks.mozilla.org/2018/09/speech-recognition-deepspeech/) The code is working great. I’m blown away by the quality of Deep Speech!

I am wondering if it is possible to extract words from the streaming processor as they are produced. Assuming that the words are processed in real time (or close to it) would it be possible to extract those words without having to end the streaming process? As of now the program only outputs your speech with model.finishStream()

Thanks!

That’s what DS_IntermediateDecode() does: https://github.com/mozilla/DeepSpeech/blob/41c3ffbed2d9e6c8e00522353115f373b48573db/native_client/deepspeech.h#L181-L193

Note that currently DS_IntermediateDecode is very expensive, you can’t keep calling it indefinitely, it requires use of a voice activity detection module to find silence points and call DS_FinishStream() so that the stream doesn’t go on for too long. Making DS_IntermediateDecode be fast requires a streaming decoder, which is something that’s on the backlog but I haven’t had time to work on. If anyone wants to work on this I can give guidance.

@reuben I would be interested in looking into this if you let me know what the requirements / expected features should be.

1 Like

@dabinat Awesome! I’ve made a comment over on the feature request issue explaining the idea: https://github.com/mozilla/DeepSpeech/issues/1837#issuecomment-491444161

1 Like