I’m using a homemade model, for french language, files are ok.
My mic array records in continuous, and inference is done when a vader function cut.
When I talk, without noise, inference is very good, but…
with TV or other noise, inference produce anything like this :
le a eaefanke eethe
It doesn’t correspond to any word in my vocabulary.txt
my question :
How could I restrict inference to known words ? (and forget others)