"Hello this is a test" returns "hallo this is a tast" (other non-words also returned)

It seems that my installation of deepspeech (0.6.1) is returning non-words:

  • “Hello this is a test” returns “hallo this is a tast”
  • “i like apples and oranges” returns “i like auples and aranges”

Am I doing something wrong? Can I configure it to return only actual words?

The algorithm searches in the language model (trie + lm.binary) for words. If “tast” is in there, it can be an output. So you could reduce the word combinations in the language model.

So I have downloaded generate_lm.py and want to use it to generate my own language model, but I’m unsure how to do it… I just need to feed it a text file containing my sentences?

Sorry, you’ll have to read the code and/or read documentation. Building your own language model makes up roughly 20% of questions here, so be prepared to put in a couple of hours.

https://deepspeech.readthedocs.io/en/v0.7.0/TRAINING.html