Just a note, the release is v0.2.0 not v2.0.0, a big difference
As to which words and phrases work with v0.1.1 vs v0.2.0, with which background noise, with which microphones, and with which up/down sampling. Unfortunately, we have little control.
What I’d suggest is to create a data set of the words and phrases you expect, with background noise from your use case, with the specific microphones you expect to use, and with the up/down sampling in your processing chain. Then to take this data and fine-tune the model we provided to your use case.
In addition you can create a language model and trie which are tuned to your use case too.