Punctuation Model

So if one were to use DeepSpeech for live transcription (like Google’s Live Transcribe, for instance), they would need to implement a punctuation model as well.

Has anyone looked into how this might work? Or are there already Python libraries that do this? So you would run your DeepSpeech output through those models and hopefully get well-punctuated and more readable sentences.

I know that this issue may be out of scope for the DeepSpeech project, but is relevant to the overall context in which a DeepSpeech solution might be used. So, thoughts?

A little surprised you didn’t find this by Googling as there’s a Stack Overflow answer which comes up and it’s linked to a Quora answer on this topic too, but it’s an interesting thing to discuss as I expect plenty of people would want this to use in conjunction with Deep speech.

They give some background and point to this GitHub project which looks promising:

Would be interesting to hear how you get on if you explore using this repo (or pursue other options)

Also someone brought it up in this forum too:

I have not tested thoroughly, but I’m wondering how much you could rely on just keeping punctuation in the datasets and in the language model. I remember doing that, by mistake, and it would indeed learn to add commas and others on sound pauses.

Might not be as efficient as a real model dedicated to it, though.

Interesting find. Didn’t come across that project during my search, but chanced upon the following:
Deep Correct

I do plan on trying both the repositories out in future when I get to that point in my project. Punctuator2 seems to be using Bidirectional RNN so that might not work as well with DeepSpeech’s streaming capabilities, but will need to be explored.

@lissyx That might be worth exploring too. Especially for a custom language model. Do you think training the acoustic model on punctuated sentences would also make it learn to punctuate (though personally, I think that wouldn’t be the best idea for a general-purpose acoustic model)?

It might, but I don’t want to be hold responsible for any damages :smiley: