Testing with my own voice it could recognize “yes” and “no” very well, so in general it seems to work nicely.
The problem is that if I give another input where other words are spoken (“test”,“check”,…) the deepspeech algorithm recognizes them as “yes”, what I dont’t want -> it should be shown that no words are recognized.
I use my own language model where only the 2 words yes and no are allowed.
Is there a way to solve this?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Acoustic model outputs probabilities for each class of character, not at word-level. So once we decoded those and we have a string, I’m not so sure we have that information of “known” / “unknown” words. Maybe I misunderstood your point, but I guess that you should look deepeer into why your model confuses other words for one of the known words. I guess it’s too much overfitted.
You need to add the “unknown” class and train with it. Then in your training data set the transcription of everything that isn’t a yes or a no to “unknown”. That’d be a good start. The language model needs to be aware of the unknown’s too.
I used the Guide from elpimous_robot for creating the language model. But I don’t think the language model is the problem. I trained a model without language model and also started the deepspeech recognition without it, and the other words are still recognized wrong.
@reuben
I am not sure if this is the right way the go. If I would like every other word then yes or no to be unclassified I would have to train all these possible words to be unknown as well.
What I realized is, that I used the same data for training, dev and test. I need to change that for sure… Could this be the problem, so that the model is overfitted?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
If you get the wrong words out of the inference without language model, then it’s just that your training is too much overfitted and is unable to recognized that it cannot recognize an unknown word
Ok so it might be an overfitting problem. But how to solve this?
I tried to use early stopping which should reduce overfitting, but it does not really help. I also played around with the n_hidden parameter, but no change as well.
I wonder if there is a way to define an unknown class. I tried different things to put into the transcript in the csv file like: “”, " " or just an empty space, but everything throws some errors.
I also tried a different amount of training samples. The highest used is about 800 per voice command.
Or is it just not possible with deepspeech to generate a model that just recognizes 2 words?