I’ve been testing the model which was released a few days ago. I recorded myself saying a few lines which are found in the readme.
The expected result:
Alternatively, quicker inference can be performed using a supported NVIDIA GPU on Linux. (See below to find which GPU’s are supported.) This is done by instead installing the GPU specific package with the command:
pip install deepspeech-gpu
actual result can be seen in the image below
I can unfortunately not upload the .wav file here, if it’s necessary I can upload it somewhere else.
Is this the expected performance of deep speech? I’m hypothesising that the language model used is not trained on the vocabulary I’m using. Is there anything to gain by looking at another language model?