How to classify unknown words, how to ignore words

Hi, I am trying to implement a command recognizer to recognize simple words.
For the first test I am using only “yes” and “no”. I trained my own model with the audio data from https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

Testing with my own voice it could recognize “yes” and “no” very well, so in general it seems to work nicely.

The problem is that if I give another input where other words are spoken (“test”,“check”,…) the deepspeech algorithm recognizes them as “yes”, what I dont’t want -> it should be shown that no words are recognized.

I use my own language model where only the 2 words yes and no are allowed.

Is there a way to solve this?

Acoustic model outputs probabilities for each class of character, not at word-level. So once we decoded those and we have a string, I’m not so sure we have that information of “known” / “unknown” words. Maybe I misunderstood your point, but I guess that you should look deepeer into why your model confuses other words for one of the known words. I guess it’s too much overfitted.

Maybe have a look at what @elpimous_robot did ? He worked on something similar TUTORIAL : How I trained a specific french model to control my robot

what is the output when you run the inference without the language model (ommit lm.binary and trie from your inference call) ?

It’s going to be a string as well :slight_smile:

You need to add the “unknown” class and train with it. Then in your training data set the transcription of everything that isn’t a yes or a no to “unknown”. That’d be a good start. The language model needs to be aware of the unknown’s too.

certainly :), but it’d be interesting to see if the acoustic model already turns any input to yes/no characters or if it’s the lm part that does that

I worked on anything like that…
See #795

But I agree with Lissyx : it seems like an overfitting problem !

Thanks for all the replies so far!

I used the Guide from elpimous_robot for creating the language model. But I don’t think the language model is the problem. I trained a model without language model and also started the deepspeech recognition without it, and the other words are still recognized wrong.

@reuben
I am not sure if this is the right way the go. If I would like every other word then yes or no to be unclassified I would have to train all these possible words to be unknown as well.

What I realized is, that I used the same data for training, dev and test. I need to change that for sure… Could this be the problem, so that the model is overfitted?

If you get the wrong words out of the inference without language model, then it’s just that your training is too much overfitted and is unable to recognized that it cannot recognize an unknown word :slight_smile:

Ok so it might be an overfitting problem. But how to solve this?
I tried to use early stopping which should reduce overfitting, but it does not really help. I also played around with the n_hidden parameter, but no change as well.

I wonder if there is a way to define an unknown class. I tried different things to put into the transcript in the csv file like: “”, " " or just an empty space, but everything throws some errors.

I also tried a different amount of training samples. The highest used is about 800 per voice command.

Or is it just not possible with deepspeech to generate a model that just recognizes 2 words?