Based in this benchmark, my assumption is that DeepSpeech performs very well compared to Amazon Transcribe and other alternatives. However, after installing the latest version and using the below script for testing I found that the models are performing really bad:
from deepspeech import Model, Stream
import numpy as np
import os
import pyaudio
import time
# DeepSpeech parameters
model_file_path = '/Users/user/deepspeech-0.7.3-models.pbmm'
beam_width = 500
model = Model(model_file_path)
model.setBeamWidth(500)
model.setScorerAlphaBeta(0.80,1.85)
model.enableExternalScorer('/Users/user/deepspeech-0.7.3-models.scorer')
# Create a Streaming session
context = model.createStream()
# Encapsulate DeepSpeech audio feeding into a callback for PyAudio
text_so_far = ''
def process_audio(in_data, frame_count, time_info, status):
global text_so_far
data16 = np.frombuffer(in_data, dtype=np.int16)
Stream.feedAudioContent(context, data16)
text = Stream.intermediateDecode(context)
if text != text_so_far:
print('Interim text = {}'.format(text))
text_so_far = text
return (in_data, pyaudio.paContinue)
# PyAudio parameters
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
CHUNK_SIZE = 1024
# Feed audio to deepspeech in a callback to PyAudio
audio = pyaudio.PyAudio()
stream = audio.open(
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK_SIZE,
stream_callback=process_audio
)
print('Please start speaking, when done press Ctrl-C ...')
stream.start_stream()
try:
while stream.is_active():
time.sleep(0.1)
except KeyboardInterrupt:
# PyAudio
stream.stop_stream()
stream.close()
audio.terminate()
print('Finished recording.')
# DeepSpeech
text = Stream.finishStream(context)
print('Final text = {}'.format(text))
The pre-trainned model I am using is [deepspeech-0.7.3-models.pbmm](curl -LO https://github.com/mozilla/DeepSpeech/releases/download/v0.7.3/deepspeech-0.7.3-models.pbmm) is performing really bad for english. Is the above the right configuration? or should I use different parameters, other language model or fine tune? My env is the following:
absl-py==0.9.0
aiofiles==0.5.0
aiohttp==3.6.2
algoliasearch==2.3.0
APScheduler==3.6.3
astor==0.8.1
async-generator==1.10
async-timeout==3.0.1
attrs==19.3.0
beautifulsoup4==4.6.3
blis==0.2.4
boto3==1.13.26
botocore==1.16.26
cachetools==4.1.0
certifi==2020.4.5.2
cffi==1.14.0
chardet==3.0.4
cloudpickle==1.3.0
colorclass==2.2.0
coloredlogs==10.0
colorhash==1.0.2
cryptography==2.9.2
cycler==0.10.0
cymem==2.0.3
DataProperty==0.49.1
decorator==4.4.2
deepspeech==0.7.3
dnspython==1.16.0
docopt==0.6.2
docutils==0.15.2
fbmessenger==6.0.0
future==0.18.2
gast==0.2.2
geographiclib==1.50
geopy==1.18.1
gevent==1.5.0
google-auth==1.16.1
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
greenlet==0.4.16
grpcio==1.29.0
gspread==3.0.1
h11==0.8.1
h2==3.2.0
h5py==2.10.0
hpack==3.0.0
hstspreload==2020.6.9
httplib2==0.18.1
httptools==0.1.1
httpx==0.9.3
humanfriendly==8.2
hyperframe==5.2.0
idna==2.8
importlib-metadata==1.6.1
jmespath==0.10.0
joblib==0.15.1
jsonpickle==1.4.1
jsonschema==3.2.0
kafka-python==1.4.7
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.2.0
mailchimp3==3.0.14
Markdown==3.2.2
matplotlib==3.2.1
mattermostwrapper==2.2
mbstrdecoder==1.0.0
msgfy==0.1.0
multidict==4.7.6
murmurhash==1.0.2
networkx==2.4
numpy==1.17.0
oauth2client==4.1.3
oauthlib==3.1.0
opt-einsum==3.2.1
packaging==20.4
pandas==0.24.2
pathvalidate==2.3.0
pika==1.1.0
plac==0.9.6
preshed==2.0.1
prompt-toolkit==2.0.10
protobuf==3.12.2
psycopg2-binary==2.8.5
pyasn1==0.4.8
pyasn1-modules==0.2.8
PyAudio==0.2.11
pycparser==2.20
pydot==1.4.1
PyJWT==1.7.1
pykwalify==1.7.0
pymongo==3.8.0
pyparsing==2.4.7
pyrsistent==0.16.0
PySocks==1.7.1
pytablewriter==0.54.0
python-crfsuite==0.9.7
python-dateutil==2.8.1
python-engineio==3.12.1
python-socketio==4.5.1
python-telegram-bot==12.7
pytz==2019.3
PyYAML==5.3.1
questionary==1.5.2
rasa==1.10.2
rasa-sdk==1.10.1
redis==3.5.3
requests==2.21.0
requests-oauthlib==1.3.0
requests-toolbelt==0.9.1
rfc3986==1.4.0
rocketchat-API==1.3.1
rsa==4.0
ruamel.yaml==0.16.10
ruamel.yaml.clib==0.2.0
s3transfer==0.3.3
sanic==19.12.2
Sanic-Cors==0.10.0.post3
sanic-jwt==1.4.1
Sanic-Plugins-Framework==0.9.2
scikit-learn==0.22.2.post1
scipy==1.4.1
six==1.15.0
sklearn-crfsuite==0.3.6
slackclient==2.7.1
sniffio==1.1.0
spacy==2.1.9
SQLAlchemy==1.3.17
srsly==1.0.2
tabledata==1.1.2
tabulate==0.8.7
tcolorpy==0.0.5
tensorboard==2.1.1
tensorflow==2.1.1
tensorflow-addons==0.7.1
tensorflow-estimator==2.1.0
tensorflow-hub==0.8.0
tensorflow-probability==0.9.0
tensorflow-text==2.1.0rc0
termcolor==1.1.0
terminaltables==3.1.0
thinc==7.0.8
tornado==6.0.4
tqdm==4.45.0
twilio==6.26.3
typepy==1.1.1
tzlocal==2.1
ujson==2.0.3
urllib3==1.24.3
uvloop==0.14.0
wasabi==0.6.0
wcwidth==0.2.4
webexteamssdk==1.3
webrtcvad==2.0.10
websockets==8.1
Werkzeug==1.0.1
wrapt==1.12.1
yarl==1.4.2
zipp==3.1.0
Which is the one I installed as mentioned in the docs.