I am currently in the process of finishing my application where DeepSpeech is a central component and I am already astonished by the results. The versions I am using are Python 3.6.8 and DeepSpeech 0.6.1.
One thing I haven’t actually managed to do is obtaining a list or tuple of the candidates for the most recent word that has been spoken/decoded with indermediate-decode. I read the documentation and searched in this forum but I am not shure if or how I can do this in combination with model.intermediateDecode (not model.stt and sttWithMetadata)
An example: I say “two”, Deepspeech will think it’s either “two”, “to” or “too” and finally decides on one and I get back a string with intermediate.Decode. How to obtain the other words/candidates?
My code snippet for feeding/resampling/decoding (working):
model = deepspeech.Model(MODEL_FILE_PATH, BEAM_WIDTH)
model.enableDecoderWithLM(LM_FILE_PATH, TRIE_FILE_PATH, LM_ALPHA, LM_BETA)
context = model.createStream()
def process_audio(in_data, frame_count, time_info, status):
data16 = np.frombuffer(in_data, dtype=np.int16)
resample_size = int(len(data16) / RATE * 16000) #RATE is 44.1kHz in this case
resample = signal.resample(data16, resample_size) #using scipy
resample16 = np.array(resample, dtype=np.int16) #numpy
text = model.intermediateDecode(context)
if text != text_so_far:
text_so_far = text
TextCut = cut_strings(text_so_far)
return (in_data, pyaudio.paContinue)
Thank you very much in advance!