DeepSpeech giving bad results

I am new to DeepSpeech i followed this link to create Speech to text code, but my results are no where near to the original speech. I am using Deepspeech 0.6.1 and have installed the relevant pretrained model. I am using this link to create my wav file with default options. Below is my code.

import numpy as np
import wave
from deepspeech import Model
from import wavfile as wav
import speech_recognition as sr

audio_file = "D:/Dataset/DeepSpeech/nz.wav"
ds = Model('D:/Dataset/DeepSpeech/deepspeech-0.6.1-models/models/output_graph.pbmm',500)
ds.enableDecoderWithLM('D:/Dataset/DeepSpeech/deepspeech-0.6.1-models/models/lm.binary','D:/Dataset/DeepSpeech/deepspeech-0.6.1-models/models/trie', 0.75, 1.85)
rate, audio =
transcript =ds.stt(audio)

I am suspecting that this issue because of my audio format or something. Please help me with this issue how can i make the most of deepspeech library.


I have used below configuration to create the wav file.

After that i used audacity software to export my .wav file WAV (microsoft) signed 16bit PCM

Also i am getting different output from command line and from my code even though i have added lm.binary file and trie in my code.
I don’t know how to generate the .wav file through my python code so i have opted for this long process.

Below is my output:

original: newzeland run chase off to a solid start

command line: the news and ranges offers anitar

Code: and he also a

I am also attaching my audio file if it helps (272.3 KB)

Command use to run the same through command line

deepspeech --model output_graph.pbmm --lm lm.binary --trie trie --audio nz.wav

*Note i am using windows 8

We can’t help you if you don’t share how you produced your audio. The link you gave has thousands of options. Audio codec, format, recording conditions, speakers, etc.

Expected/Actual transcript would also help us.

i have updated my question. Please have a look

Audio quality is very poor, there’s lot of noise / it’s like you are talking in a tube. It took me at least 3 retries before I could hear “new zealand” but I’m still unable to understand what you said.

It looks to me you are talking with a non american english accent. Between the audio quality and your accent, your “command line” result is not surprising. There’s no good answer sadly, except try to improve the recording, but as for your accent, problem is we don’t have enough data to be able to handle properly non american english accents. I have the same issue when I speak English to the model with my French accent.

This is likely wav that does funny things. Please try with the wave module instead.

Thanks for your detailed response.
I hope the code is fine and it doesn’t have any problems ?
It would also help if you can tell me how can i create the .wav file with my python code using microphone and it should be 16bit int array and 16000hz.

We have several exampels covering that: