How to classify unknown words, how to ignore words

johannes.selymes · January 10, 2018, 2:17pm

Hi, I am trying to implement a command recognizer to recognize simple words.
For the first test I am using only “yes” and “no”. I trained my own model with the audio data from https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

Testing with my own voice it could recognize “yes” and “no” very well, so in general it seems to work nicely.

The problem is that if I give another input where other words are spoken (“test”,“check”,…) the deepspeech algorithm recognizes them as “yes”, what I dont’t want -> it should be shown that no words are recognized.

I use my own language model where only the 2 words yes and no are allowed.

Is there a way to solve this?

lissyx · January 11, 2018, 11:21am

Acoustic model outputs probabilities for each class of character, not at word-level. So once we decoded those and we have a string, I’m not so sure we have that information of “known” / “unknown” words. Maybe I misunderstood your point, but I guess that you should look deepeer into why your model confuses other words for one of the known words. I guess it’s too much overfitted.

Maybe have a look at what @elpimous_robot did ? He worked on something similar TUTORIAL : How I trained a specific french model to control my robot

yv001 · January 11, 2018, 11:26am

what is the output when you run the inference without the language model (ommit lm.binary and trie from your inference call) ?

lissyx · January 11, 2018, 11:29am

It’s going to be a string as well

reuben · January 11, 2018, 11:43am

You need to add the “unknown” class and train with it. Then in your training data set the transcription of everything that isn’t a yes or a no to “unknown”. That’d be a good start. The language model needs to be aware of the unknown’s too.

yv001 · January 11, 2018, 12:15pm

certainly :), but it’d be interesting to see if the acoustic model already turns any input to yes/no characters or if it’s the lm part that does that

elpimous_robot · January 11, 2018, 12:34pm

I worked on anything like that…
See #795

But I agree with Lissyx : it seems like an overfitting problem !

johannes.selymes · January 11, 2018, 12:44pm

Thanks for all the replies so far!

I used the Guide from elpimous_robot for creating the language model. But I don’t think the language model is the problem. I trained a model without language model and also started the deepspeech recognition without it, and the other words are still recognized wrong.

@reuben
I am not sure if this is the right way the go. If I would like every other word then yes or no to be unclassified I would have to train all these possible words to be unknown as well.

What I realized is, that I used the same data for training, dev and test. I need to change that for sure… Could this be the problem, so that the model is overfitted?

lissyx · January 11, 2018, 12:53pm

If you get the wrong words out of the inference without language model, then it’s just that your training is too much overfitted and is unable to recognized that it cannot recognize an unknown word

johannes.selymes · January 16, 2018, 12:55pm

Ok so it might be an overfitting problem. But how to solve this?
I tried to use early stopping which should reduce overfitting, but it does not really help. I also played around with the n_hidden parameter, but no change as well.

I wonder if there is a way to define an unknown class. I tried different things to put into the transcript in the csv file like: “”, " " or just an empty space, but everything throws some errors.

I also tried a different amount of training samples. The highest used is about 800 per voice command.

Or is it just not possible with deepspeech to generate a model that just recognizes 2 words?