Bad performance while using deepspeech-0.7.3-models.pbmm?

user1 · June 11, 2020, 3:24am

Based in this benchmark, my assumption is that DeepSpeech performs very well compared to Amazon Transcribe and other alternatives. However, after installing the latest version and using the below script for testing I found that the models are performing really bad:

from deepspeech import Model, Stream
import numpy as np
import os
import pyaudio
import time

# DeepSpeech parameters
model_file_path = '/Users/user/deepspeech-0.7.3-models.pbmm'
beam_width = 500
model = Model(model_file_path)
model.setBeamWidth(500)
model.setScorerAlphaBeta(0.80,1.85)
model.enableExternalScorer('/Users/user/deepspeech-0.7.3-models.scorer')

# Create a Streaming session
context = model.createStream()

# Encapsulate DeepSpeech audio feeding into a callback for PyAudio
text_so_far = ''
def process_audio(in_data, frame_count, time_info, status):
    global text_so_far
    data16 = np.frombuffer(in_data, dtype=np.int16)
    Stream.feedAudioContent(context, data16)
    text = Stream.intermediateDecode(context)
    if text != text_so_far:
        print('Interim text = {}'.format(text))
        text_so_far = text
    return (in_data, pyaudio.paContinue)

# PyAudio parameters
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK_SIZE = 1024

# Feed audio to deepspeech in a callback to PyAudio
audio = pyaudio.PyAudio()
stream = audio.open(
    format=FORMAT,
    channels=CHANNELS,
    rate=RATE,
    input=True,
    frames_per_buffer=CHUNK_SIZE,
    stream_callback=process_audio
)

print('Please start speaking, when done press Ctrl-C ...')
stream.start_stream()

try: 
    while stream.is_active():
        time.sleep(0.1)
except KeyboardInterrupt:
    # PyAudio
    stream.stop_stream()
    stream.close()
    audio.terminate()
    print('Finished recording.')
    # DeepSpeech
    text = Stream.finishStream(context)
    print('Final text = {}'.format(text))

The pre-trainned model I am using is [deepspeech-0.7.3-models.pbmm](curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-models.pbmm) is performing really bad for english. Is the above the right configuration? or should I use different parameters, other language model or fine tune? My env is the following:

absl-py==0.9.0
aiofiles==0.5.0
aiohttp==3.6.2
algoliasearch==2.3.0
APScheduler==3.6.3
astor==0.8.1
async-generator==1.10
async-timeout==3.0.1
attrs==19.3.0
beautifulsoup4==4.6.3
blis==0.2.4
boto3==1.13.26
botocore==1.16.26
cachetools==4.1.0
certifi==2020.4.5.2
cffi==1.14.0
chardet==3.0.4
cloudpickle==1.3.0
colorclass==2.2.0
coloredlogs==10.0
colorhash==1.0.2
cryptography==2.9.2
cycler==0.10.0
cymem==2.0.3
DataProperty==0.49.1
decorator==4.4.2
deepspeech==0.7.3
dnspython==1.16.0
docopt==0.6.2
docutils==0.15.2
fbmessenger==6.0.0
future==0.18.2
gast==0.2.2
geographiclib==1.50
geopy==1.18.1
gevent==1.5.0
google-auth==1.16.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
greenlet==0.4.16
grpcio==1.29.0
gspread==3.0.1
h11==0.8.1
h2==3.2.0
h5py==2.10.0
hpack==3.0.0
hstspreload==2020.6.9
httplib2==0.18.1
httptools==0.1.1
httpx==0.9.3
humanfriendly==8.2
hyperframe==5.2.0
idna==2.8
importlib-metadata==1.6.1
jmespath==0.10.0
joblib==0.15.1
jsonpickle==1.4.1
jsonschema==3.2.0
kafka-python==1.4.7
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.2.0
mailchimp3==3.0.14
Markdown==3.2.2
matplotlib==3.2.1
mattermostwrapper==2.2
mbstrdecoder==1.0.0
msgfy==0.1.0
multidict==4.7.6
murmurhash==1.0.2
networkx==2.4
numpy==1.17.0
oauth2client==4.1.3
oauthlib==3.1.0
opt-einsum==3.2.1
packaging==20.4
pandas==0.24.2
pathvalidate==2.3.0
pika==1.1.0
plac==0.9.6
preshed==2.0.1
prompt-toolkit==2.0.10
protobuf==3.12.2
psycopg2-binary==2.8.5
pyasn1==0.4.8
pyasn1-modules==0.2.8
PyAudio==0.2.11
pycparser==2.20
pydot==1.4.1
PyJWT==1.7.1
pykwalify==1.7.0
pymongo==3.8.0
pyparsing==2.4.7
pyrsistent==0.16.0
PySocks==1.7.1
pytablewriter==0.54.0
python-crfsuite==0.9.7
python-dateutil==2.8.1
python-engineio==3.12.1
python-socketio==4.5.1
python-telegram-bot==12.7
pytz==2019.3
PyYAML==5.3.1
questionary==1.5.2
rasa==1.10.2
rasa-sdk==1.10.1
redis==3.5.3
requests==2.21.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
rfc3986==1.4.0
rocketchat-API==1.3.1
rsa==4.0
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
s3transfer==0.3.3
sanic==19.12.2
Sanic-Cors==0.10.0.post3
sanic-jwt==1.4.1
Sanic-Plugins-Framework==0.9.2
scikit-learn==0.22.2.post1
scipy==1.4.1
six==1.15.0
sklearn-crfsuite==0.3.6
slackclient==2.7.1
sniffio==1.1.0
spacy==2.1.9
SQLAlchemy==1.3.17
srsly==1.0.2
tabledata==1.1.2
tabulate==0.8.7
tcolorpy==0.0.5
tensorboard==2.1.1
tensorflow==2.1.1
tensorflow-addons==0.7.1
tensorflow-estimator==2.1.0
tensorflow-hub==0.8.0
tensorflow-probability==0.9.0
tensorflow-text==2.1.0rc0
termcolor==1.1.0
terminaltables==3.1.0
thinc==7.0.8
tornado==6.0.4
tqdm==4.45.0
twilio==6.26.3
typepy==1.1.1
tzlocal==2.1
ujson==2.0.3
urllib3==1.24.3
uvloop==0.14.0
wasabi==0.6.0
wcwidth==0.2.4
webexteamssdk==1.3
webrtcvad==2.0.10
websockets==8.1
Werkzeug==1.0.1
wrapt==1.12.1
yarl==1.4.2
zipp==3.1.0

Which is the one I installed as mentioned in the docs.

lissyx · June 11, 2020, 10:23am

Without more actionable information, it’s hard to be definitive.
How much bad is bad? What audio do you feed it?
You report using your own code, do you reproduce poor quality with our code?

Why those values? Release page documents, and this can have a very negative effect:

    lm_alpha 0.931289039105002
    lm_beta 1.1834137581510284