I have trained a custom model and it’s giving accepted results so far.
Problem is with silence moments, especially at the beginning. When I do inferring with --json the first word always start at time=0, and usually a wrong word is detected. Consecutive words are usually correct.
The question is: How shall silence be “Skipped”, so the first word doesn’t always start at zero … and give a better detection?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
This is where it’s the best place to fix it
We’ve got reports of behavior like that on Github, but could not really replicate.
People adjusted the library to add some padding of a few ms (50 I think) and it was helping a lot. Given we could not replicate the original issue, hard to actionate on that for now.