Dear all, I am very new to DeepSpeech, and just started playing something. I have installed DeepSpeech v0.9.3 on Win10-GPU, and downloaded the pretrained model file. I have read the document carefully, but I still have two simple questions.
- I wrote simple script:
from deepspeech import Model
def audio2text():
model_path = "D:\Downloads\deepspeech-0.9.3-models.pbmm"
audio_file = "D:\Downloads\audio-0.9.3.tar\audio\2830-3980-0043.wav"
audio_buffer = ?(audio_file)
model = Model(model_path)
text = model.stt(audio_buffer)
if __name__ == '__main__':
audio2text()
I just want to see if it is possible to convert audio to text. As stt takes a 16-bit, mono raw audio signal, can anybody tell me how to read an audio file (e.g., wav) into audio_buffer in the method above?
- I also want to fine-tune a pretrained model, and I think I need to use training.deepspeech_training.train script. However, there is an import issue with
from ds_ctcdecoder import ctc_beam_search_decoder, Scorer
, and I don’t have the ds_ctcdecoder module. I guess this might be due to installation on Windows. I tried to search from old issues, but didn’t find the answers.
Thank you very much for your kind help.