Yeah I think some of the efficient KWS have a ‘is keyword’ / ‘not keyword’ and that helps with model size and accuracy.
Google ran “Visual Wake Words Challenge”, soliciting submissions of tiny vision models for microcontrollers.
Which https://github.com/mit-han-lab/VWW won and even though visual guess with Spectrograms preprocessing MFCC (Mel Filter) be it image or voice it doesn’t matter.
I noticed another article https://blog.aspiresys.pl/technology/building-jarvis-nlp-hot-word-detection/ that also blurs voice/image with Spectrograms preprocessing MFCC (Mel Filter)
.
Seemed to be built with an extremely low audio lib and also split into ‘is keyword’ / ‘not keyword’.
- 200 positive samples recorded over the varying degree of background noise.
- 100 positive samples recorded over silence.
- 200 negative samples of random words recorded over varying degree of background noise.
- 100 negative samples recorded over silence.
Its says starting dataset but from what I have seen that is very low and should make a tiny model.
From playing with a PI4 the load of the standard model download and install of deepspeech seems less load than what I have seen on Mycroft Precise KWS.
So thought I would ask is there anyway with deepspeech to create that is/not model that seems to help with low size models?
Also would be really great to get an accompanying KWS for Deepspeech as doesn’t need amazingly accurate but just some opensource using pretty much the same lib set that is more of a mock setup of how KWS & Deepspeech should interact.
But maybe the mentioned above could form a basic KWS for deepspeech?
Apols about the necro but a demo kws would be great.