Restricted Vocabulary

I’ve been following this project for a while now. Great work. I am still wondering if there’s a viable way to restrict the vocabulary to certain words used in e.g. a home automation environment to improve accuracy. I just want Deep Speech to understand a handful of words such as ‘turn the light on’ or ‘turn the living room fan off’. After Snips has gone Sonos, pretty much everybody in the maker environment is looking for alternatives.

Just make them into a text file and generate your own language model. It is documented under data/lm.

Just to be sure: Does this require a new set of wavs or are is the restricted language model extracted from the complete model?

m.

Your build it yourself. This is the language model, you can re-use the acoustic model. We have experimented quite a lot, and it gives pretty good results for that kind of use-case. This way you don’t have to rebuild a long training step, small language model can be created quickly and efficiently. Just follow the link, and build kenlm: I have code doing that on device (RPi4), for example.

@lissyx would you mind sharing the code you mentioned for doing this on device? Thanks.

There’s nothing to share, I just built KenLM’s tools for ARM and ran them on-device …

Have a read here