Hello all,
I would like to raise a question regarding the possibility to customize the language model (and the reasoning for that) when using for voice recognition. I would like to create a simple application to control a machine with a small “context” of commands (cca 100 words in total, put into sentences with a defined structure like "Hey machine, {set|get} the {voltage|air pressure|whatever} to ") and would like to evaluate the performance in a noisy environment. This all shall run on Raspberry Pi 4, so it shall be reasonably HW demanding.
First, I would like to double-check that one of my assumptions is correct: Limiting the vocabulary and language model shall have positive impact on the performance also in noisy environment, correct? Since I’m basically reducing the size of the set of words/N-grams that can occur, I’m increasing the probability that the correct word combination will be recognized, even in an noisier environment?
In general, my questions are:
- Is there a common way to create such a custom limited vocabulary and language model while using pretrained models for other components? I.e. I don’t need to have for example a set of audio recordings?
- Is there a way how to configure the system into following configuration? Say that I would have the “set” and “get” commands for different parameters and I would like to define the probabilities to {0.1, 0.1, 0.8} for {“set”, “get”, }? Putting the there to avoid false detection of “set” or “get” when there would be a completely different word? Or is there a better way to handle this?
- In my scenario, what other stuff could I tune to increase the noise resistance?
Thank you in advance for your help