DeepSpeech generates long nonsense tokens as output

I have tested using Deep Speech’s pretrained model (i.e. deepspeech-0.1.1-models) on a resampled-to-16hz wav file which consists of a 1 min recording. The result is kinda strange, as it generates very long tokens that do not make any sense at all. For example:

“rairaiaprgramsthereussey”,

“proremihadthemigtybetardprogramteyeradiqhorembertireseveted”

… …

It also doesn’t seem to be easy to segment these long tokens to several legit tokens. I have tried other speech-to-text APIs and have not encountered the same issue. I would be appreciated if anyone can shed light on this. Thanks!

https://github.com/mozilla/DeepSpeech/issues/1156