How can silence be better handled?

I have trained a custom model and it’s giving accepted results so far.

Problem is with silence moments, especially at the beginning. When I do inferring with --json the first word always start at time=0, and usually a wrong word is detected. Consecutive words are usually correct.

The question is: How shall silence be “Skipped”, so the first word doesn’t always start at zero … and give a better detection?

This is where it’s the best place to fix it

We’ve got reports of behavior like that on Github, but could not really replicate.
People adjusted the library to add some padding of a few ms (50 I think) and it was helping a lot. Given we could not replicate the original issue, hard to actionate on that for now.

Thanks for that tip … will try it out.