Language Model influence on word timings

ena.1994 · July 14, 2019, 11:43am

@dabinat could you explain shortly where and how we get the word beginning and ending times, is the usage of a language model influencing the process? Or is there any documentation about it already? I would like to have “phoneme” timings or letter timings instead and I’m wondering how difficult this would be to implement. Thank you already

lissyx · July 14, 2019, 2:22pm

Extended Metadata already expose per-character timing, have you tried that ?

dabinat · July 14, 2019, 6:47pm

The API currently only exposes letter timings. As far as I know, only the native client converts it to word timings and the other clients (Python, .Net etc) expose only the letter timings. So either use those clients or edit the native client to remove the word timings.

@lissyx Do you think it would be useful to have a native client flag to toggle between letter and word timings?

lissyx · July 15, 2019, 7:31am

No, the deepspeech C++ binary is not intended for more than demo purpose.

That’s my point, there’s already enough information exposed, and examples of how to exploit those in the way @ena.1994 needs it.

dwn · July 16, 2019, 7:29am

Quick question on this: is this per-character timing calculated from the frame classification or is there a more sophisticated implementation. I fear that LSTM+CTC trained alignments can be quite off?

See e.g. Sak, Haşim, et al. “Learning acoustic frame labeling for speech recognition with recurrent neural networks.” 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2015.

dabinat · July 17, 2019, 5:13am

It’s a pretty simple implementation so the start point is often late. A 3ms offset worked decently on my test files.

ena.1994 · July 30, 2019, 6:36am

I’ve tried to expose the per-character timing with client.py and after adding some code I’ve got it now. But to my origin question: is there an documentation about how the timings are computed in dsdecoder? And how the LM is influencing the timings? thanks already