Newbie questions: use DeepSpeech for voice transcribe, ds_ctcdecoder, thanks

Junyong_You · June 30, 2021, 4:31pm

Dear all, I am very new to DeepSpeech, and just started playing something. I have installed DeepSpeech v0.9.3 on Win10-GPU, and downloaded the pretrained model file. I have read the document carefully, but I still have two simple questions.

I wrote simple script:

from deepspeech import Model

def audio2text():
    model_path = "D:\Downloads\deepspeech-0.9.3-models.pbmm"
    audio_file = "D:\Downloads\audio-0.9.3.tar\audio\2830-3980-0043.wav"

    audio_buffer = ?(audio_file)

    model = Model(model_path)
    text = model.stt(audio_buffer)

if __name__ == '__main__':
    audio2text()

I just want to see if it is possible to convert audio to text. As stt takes a 16-bit, mono raw audio signal, can anybody tell me how to read an audio file (e.g., wav) into audio_buffer in the method above?

I also want to fine-tune a pretrained model, and I think I need to use training.deepspeech_training.train script. However, there is an import issue with from ds_ctcdecoder import ctc_beam_search_decoder, Scorer , and I don’t have the ds_ctcdecoder module. I guess this might be due to installation on Windows. I tried to search from old issues, but didn’t find the answers.

Thank you very much for your kind help.

lissyx · June 30, 2021, 4:34pm

training on windows is not supported, but ctcdecoder should be available: https://pypi.org/project/ds-ctcdecoder/0.9.3/#files

then again, it depends on python version. Since you provided no info, we can’t help.

Please have a look at the examples and native_client/python/client.py

Junyong_You · June 30, 2021, 4:40pm

Thanks a lot for the information. Sorry I didn’t really study the guidance before posting the question. After studying the discourse threads, I have already solved the 2nd question. Thank you very much.