Why are the words combined when we use language model


(Deepak Banka) #1

if I infer using my model without language model, I am getting this output:

on a istanash seeeelad foundre of asios sociats smalls a ce firm sand marets texes whe o engierating architectua o serven primarily servin cowe recourte this conversation sermey so lo we’re here fore isaske so questions about the wayyuecannoct meetings wire then your business we are in t trig to get insight on aa application were developing thit takes the audio

If I infer with language model, I am getting:

on a istanashseeeeladfoundreofasiossociatssmallsacefirmsandmaretstexeswheyoengieratingaarchitectuaoservenprimarilyservintcoweyrecourtethis conversation serve so what we are here for is ask some questions about the way you can not meetings were then your business we are in a try to get insight on an application were developing that takes the audio

My question is that why are many words combined/joined when we use language model?


(kdavis) #2

Not answering your question, but making a suggestion.

As documented in the README

Once everything is installed you can then use the deepspeech binary to do speech-to-text on short, approximately 5 second, audio files (currently only WAVE files with 16-bit, 16 kHz, mono are supported in the Python client)

The sentences you are feeding the system seem longer than 5 seconds, assuming they’re not from Steve Woodmore.

To improve the performance of the acoustic and language model you should limit your audio files to about 5 second in length.


(Panybj) #3

someone said that the gpu deepspeech will not combine words.