for our privacy aware smart speaker I have tried to integrate DeepSpeech into the Python speech_recognition module. Unfortunately it turns out that while it does recognize a bit, most words are wrongly recognized, or often only very few words from a sentence. Plus, it is very slow.
This contradicts several reports I heard that it runs very well on RPi4.
The integration into SpeechRecognition can be seen here, it is only one function: https://github.com/fossasia/speech_recognition/blob/e452e9f3295232a6f5de0dc789acf2e1a4311f5c/speech_recognition/init.py#L846
What it basically does is
language_model_file = ".../lm.binary" trie_file = ".../trie" prot_buffer_file = ".../output_graph.tflite" beam_width = 500 lm_alpha = 0.75 lm_beta = 1.85 ds = Model(prot_buffer_file, beam_width) desired_sample_rate = ds.sampleRate() ds.enableDecoderWithLM(language_model_file, trie_file, lm_alpha, lm_beta) raw_data = audio_data.get_raw_data(convert_rate=desired_sample_rate, convert_width=2) recognized_metadata = ds.sttWithMetadata(np.frombuffer(raw_data, np.int16))
The rest is just finding the files and checking for parameters etc etc.
Is there anything the sticks out as completely wrong?
Thanks for any comments