Hi, I tried executing the following code on a windows 10 machine. I have installed all the packages as mentioned in the documentation.
import pyaudio
import deepspeech
import numpy as np
import wave
# Load DeepSpeech model
MODEL_PATH = "deepspeech-0.9.3-models.pbmm"
SCORER_PATH = "deepspeech-0.9.3-models.scorer"
model = deepspeech.Model(MODEL_PATH)
model.enableExternalScorer(SCORER_PATH)
def transcribe_live():
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
audio = pyaudio.PyAudio()
stream = audio.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK)
ds_stream = model.createStream()
print("Listening... Speak now!")
try:
while True:
data = stream.read(CHUNK, exception_on_overflow=False)
ds_stream.feedAudioContent(np.frombuffer(data, dtype=np.int16))
metadata = ds_stream.intermediateDecodeWithMetadata()
if metadata.num_transcripts > 0:
transcript = "".join(token.text for token in metadata.transcripts[0].tokens)
print(f"Intermediate: {transcript}", end="\r")
except KeyboardInterrupt:
print("\nFinal transcription:", ds_stream.finishStream())
stream.stop_stream()
stream.close()
audio.terminate()
# Start live transcription
transcribe_live()
After the execution, i got an error saying that
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
ERROR: Model provided has model identifier 'u/3�', should be 'TFL3'
Error at reading model file D:\Practice_Python\STT\deepspeech-0.9.3-models.pbmm
Traceback (most recent call last):
File "D:\Practice_Python\STT\livestt.py", line 10, in <module>
model = deepspeech.Model(MODEL_PATH)
File "C:\Users\RK\anaconda3\envs\stt\lib\site-packages\deepspeech\__init__.py", line 38, in __init__
raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Failed to initialize memory mapped model.' (0x3000)
In order to fix this error, i tried using the .tflite version instead of .pbmm. After this change, I’m getting a new error saying that
TensorFlow: v2.3.0-6-g23ad988fcd
DeepSpeech: v0.9.3-0-gf2e9c858
Listening... Speak now!
Traceback (most recent call last):
File "D:\Practice_Python\STT\livestt.py", line 42, in <module>
transcribe_live()
File "D:\Practice_Python\STT\livestt.py", line 31, in transcribe_live
if metadata.num_transcripts > 0:
AttributeError: 'Metadata' object has no attribute 'num_transcripts'
How to fix this error? I tried speaking through the microphone while executing the code, but the execution is happening very fast and then resulting in an error. Should i mandatorily use .tflite version instead of .pbmm?