Different outputs when using DeepSpeech as python library

Hi team, I was using the pre-trained model to transcribe a short audio (Transcript:“the table is badly glued and made so sloppily that it tilts”).
When I try the deepspeech command in terminal, I get this output: “the tables badly glued and made so slowly that it duke”.
However, when I import DeepSpeech model into python, and run the python code, I get this output: “the ables barly glued an made so sloly that it tuote”
Can anyone explain why there is a difference here?
Thanks a lot.
Thank you team for the great project.

Belows are my pythn code:
from deepspeech.model import Model
import scipy.io.wavfile as wav
import sys
import os as os

ds = Model(sys.argv[1], 26, 9, sys.argv[2], 500)

pathToAudio = sys.argv[3]
audio_files = os.listdir(pathToAudio)
for eachfile in audio_files:
if eachfile.endswith(".wav"):
file_Path = pathToAudio + eachfile
print(“File to be read is “, file_Path)
fs, audio = wav.read(file_Path)
processed_data = ds.stt(audio, fs)
print(“Processed Data : " , processed_data)
with open(‘output.txt’, ‘a’) as f:
f.write(processed_data)
f.write(”\r\n”)
f.write(”\r\n")

Here are from the terminal:

quangtran@quangtran:~/DeepSpeech/DeepSpeech$ python3 mystt.py models/output_graph.pb models/alphabet.txt tmp//usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
2018-08-17 11:53:29.182887: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
File to be read is tmp/sen3.wav
Processed Data : the ables barly glued an made so sloly that it tuote
quangtran@quangtran:~/DeepSpeech/DeepSpeech$ deepspeech models/output_graph.pb /home/quangtran/DeepSpeech_AusTalk/sen3.wav models/alphabet.txt models/lm.binary models/trie
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Loading model from file models/output_graph.pb
2018-08-17 11:53:39.975545: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.306s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 2.006s.
Running inference.
the tables badly glued and made so slowly that it duke
Inference took 7.658s for 4.644s audio file.

Looks like you are not using the trie or language model in your code where the standard command line uses them.

Hi @kdavis, yeah I was not using the trie and language model as I didn’t know how to add those arguments. Could you help me with that? Or is there any documents for that? Thanks a lot.

The best source is how it was done in the deepspeech client[1].

1 Like

Awesome, thanks @kdavis