Ok so it might be an overfitting problem. But how to solve this?
I tried to use early stopping which should reduce overfitting, but it does not really help. I also played around with the n_hidden parameter, but no change as well.
I wonder if there is a way to define an unknown class. I tried different things to put into the transcript in the csv file like: “”, " " or just an empty space, but everything throws some errors.
I also tried a different amount of training samples. The highest used is about 800 per voice command.
Or is it just not possible with deepspeech to generate a model that just recognizes 2 words?