DeepSpeech giving bad results

indranil_ganguly · February 11, 2020, 12:42pm

I am new to DeepSpeech i followed this link to create Speech to text code, but my results are no where near to the original speech. I am using Deepspeech 0.6.1 and have installed the relevant pretrained model. I am using this link to create my wav file with default options. Below is my code.

import numpy as np
import wave
from deepspeech import Model
from scipy.io import wavfile as wav
import speech_recognition as sr

audio_file = "D:/Dataset/DeepSpeech/nz.wav"
ds = Model('D:/Dataset/DeepSpeech/deepspeech-0.6.1-models/models/output_graph.pbmm',500)
ds.enableDecoderWithLM('D:/Dataset/DeepSpeech/deepspeech-0.6.1-models/models/lm.binary','D:/Dataset/DeepSpeech/deepspeech-0.6.1-models/models/trie', 0.75, 1.85)
rate, audio = wav.read(audio_file)
print(audio)
transcript =ds.stt(audio)
print(transcript)

I am suspecting that this issue because of my audio format or something. Please help me with this issue how can i make the most of deepspeech library.

UPDATE:

I have used below configuration to create the wav file.

After that i used audacity software to export my .wav file WAV (microsoft) signed 16bit PCM

Also i am getting different output from command line and from my code even though i have added lm.binary file and trie in my code.
I don’t know how to generate the .wav file through my python code so i have opted for this long process.

Below is my output:

original: newzeland run chase off to a solid start

command line: the news and ranges offers anitar

Code: and he also a

I am also attaching my audio file if it helps nz.zip (272.3 KB)

Command use to run the same through command line

deepspeech --model output_graph.pbmm --lm lm.binary --trie trie --audio nz.wav

*Note i am using windows 8

lissyx · February 11, 2020, 12:54pm

We can’t help you if you don’t share how you produced your audio. The link you gave has thousands of options. Audio codec, format, recording conditions, speakers, etc.

Expected/Actual transcript would also help us.

indranil_ganguly · February 11, 2020, 2:35pm

i have updated my question. Please have a look

lissyx · February 11, 2020, 2:50pm

Audio quality is very poor, there’s lot of noise / it’s like you are talking in a tube. It took me at least 3 retries before I could hear “new zealand” but I’m still unable to understand what you said.

It looks to me you are talking with a non american english accent. Between the audio quality and your accent, your “command line” result is not surprising. There’s no good answer sadly, except try to improve the recording, but as for your accent, problem is we don’t have enough data to be able to handle properly non american english accents. I have the same issue when I speak English to the model with my French accent.

This is likely scipy.io wav that does funny things. Please try with the wave module instead.

indranil_ganguly · February 11, 2020, 3:05pm

Thanks for your detailed response.
I hope the code is fine and it doesn’t have any problems ?
It would also help if you can tell me how can i create the .wav file with my python code using microphone and it should be 16bit int array and 16000hz.

lissyx · February 11, 2020, 3:57pm

We have several exampels covering that: GitHub - mozilla/DeepSpeech-examples: Examples of how to use or integrate DeepSpeech

Topic		Replies	Views
Possible sample files to check basic functionality DeepSpeech	1	405	April 30, 2019
Improving accuracy by creating a specific model? DeepSpeech	3	3156	January 24, 2018
Real-time DeepSpeech Analysis using built-in microphone DeepSpeech participation , learning , feedback	31	16094	March 1, 2021
Different outputs when using DeepSpeech as python library DeepSpeech	4	1538	August 19, 2018
Inaccurate results from 0.9.3 model Common Voice learning	1	364	April 16, 2024

DeepSpeech giving bad results

Related topics