Different outputs when using DeepSpeech as python library

qtran · August 17, 2018, 2:09am

Hi team, I was using the pre-trained model to transcribe a short audio (Transcript:“the table is badly glued and made so sloppily that it tilts”).
When I try the deepspeech command in terminal, I get this output: “the tables badly glued and made so slowly that it duke”.
However, when I import DeepSpeech model into python, and run the python code, I get this output: “the ables barly glued an made so sloly that it tuote”
Can anyone explain why there is a difference here?
Thanks a lot.
Thank you team for the great project.

Belows are my pythn code:
from deepspeech.model import Model
import scipy.io.wavfile as wav
import sys
import os as os

ds = Model(sys.argv[1], 26, 9, sys.argv[2], 500)

pathToAudio = sys.argv[3]
audio_files = os.listdir(pathToAudio)
for eachfile in audio_files:
if eachfile.endswith(".wav"):
file_Path = pathToAudio + eachfile
print(“File to be read is “, file_Path)
fs, audio = wav.read(file_Path)
processed_data = ds.stt(audio, fs)
print(“Processed Data : " , processed_data)
with open(‘output.txt’, ‘a’) as f:
f.write(processed_data)
f.write(”\r\n”)
f.write(”\r\n")

Here are from the terminal:

quangtran@quangtran:~/DeepSpeech/DeepSpeech$ python3 mystt.py models/output_graph.pb models/alphabet.txt tmp//usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
2018-08-17 11:53:29.182887: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
File to be read is tmp/sen3.wav
Processed Data : the ables barly glued an made so sloly that it tuote
quangtran@quangtran:~/DeepSpeech/DeepSpeech$ deepspeech models/output_graph.pb /home/quangtran/DeepSpeech_AusTalk/sen3.wav models/alphabet.txt models/lm.binary models/trie
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Loading model from file models/output_graph.pb
2018-08-17 11:53:39.975545: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Loaded model in 0.306s.
Loading language model from files models/lm.binary models/trie
Loaded language model in 2.006s.
Running inference.
the tables badly glued and made so slowly that it duke
Inference took 7.658s for 4.644s audio file.

kdavis · August 17, 2018, 2:56pm

Looks like you are not using the trie or language model in your code where the standard command line uses them.

qtran · August 17, 2018, 11:08pm

Hi @kdavis, yeah I was not using the trie and language model as I didn’t know how to add those arguments. Could you help me with that? Or is there any documents for that? Thanks a lot.

kdavis · August 18, 2018, 6:28am

The best source is how it was done in the deepspeech client[1].

qtran · August 19, 2018, 6:44am

Awesome, thanks @kdavis