Real-time DeepSpeech Analysis using built-in microphone

how to get output_graph.pbmm file

what is the process to get output_graph.pbmm

Read the documentation

Which documentation can you provide me the link

Hi! Starting from the @duys’ script and the @sehar_capricon’s issue I adapted the script to match the __init__.pyof the DeepSpeech 0.6.0-g6d43e21 installed on python 3.7 and now I did the first attempts with the English pre-trained model (to downloaded it following the documentation) and audio-streaming (no .wav file). Here’s the code:

from deepspeech import Model
import numpy as np
import speech_recognition as sr
sample_rate = 16000
beam_width = 500
lm_alpha = 0.75
lm_beta = 1.85
n_features = 26
n_context = 9
models_folder = 'deepspeech-0.6.0-models/'
model_name = models_folder+"output_graph.pbmm"
alphabet = models_folder+"alphabet.txt"
language_model = models_folder+"lm.binary"
trie = models_folder+"trie"

ds = Model(model_name, beam_width)
ds.enableDecoderWithLM(language_model, trie, lm_alpha, lm_beta)

r = sr.Recognizer()
with sr.Microphone(sample_rate=sample_rate) as source:
    print("Say Something")
    audio = r.listen(source)
    fs = audio.sample_rate
    audio = np.frombuffer(audio.frame_data, np.int16)
    print(ds.stt(audio))

Hope it helps

1 Like

please i have been working on a real time speech to text, but i notice deepspeech can actually give me what i want then i notice the algorithm only accepts a wave file and not microphone as i want to record and get text in real time, does this approach finally works here in real time

No hijacking of old threads please, delete your post and start a new thread or simply google DS examples. Mic is not a problem.

nobody is hijacking, thought is a normal thing for someone to ask a question under a particular thread that reference one’s desire

Reading doesn’t seem to be your strong suit either … I’m out

No, if you read carefully the API you can see it accepts neither a WAV file nor a microphone, the library accepts chunks of audio data. Feeding that from a file, from the microphone is your responsability.

Except you are reviving a thread that is more than one year old, bringing back different context.

There are multiples examples https://github.com/mozilla/DeepSpeech-examples/ implementing already what you need, please read them.

Thanks for the link, @othiele sorry, i’m actually new here, just disturbed with my current issues.