Recognize/Transcribe WAV files in bulk


I am using DeepSpeech to transcribe audio files in bulk. I generally use below script. But it takes a long time to transcribe the audio, because the model is being loaded every time.

Please note: I don’t have the exact source transcripts for these audio clips.

Is there any quick way to transcribe the audio in bulk?

    file = 'test.csv'
    with open(file, 'r', encoding='utf-8') as my_file:
        for line in my_file:
            columns = line.strip().split(',')
            file = columns[1]
            if file != 'wav_filename':                
                proc = subprocess.Popen("deepspeech --model model/output_graph.pb --lm ../dependencies_swiss/lm.binary --trie ../dependencies_swiss/trie --audio file.wav", shell=True, stdout=subprocess.PIPE,)
                output = proc.communicate()[0]
                output = output.decode('utf-8', 'ignore')
    df = pd.DataFrame(data=files, columns=["path", "sentence"])
    df.to_csv("model/submission-test.csv", index=False)

The C++ native client (downloadable from the GitHub release files in the native_client.{target}.tar.xz archives) supports passing a folder as input and will transcribe all audio files in it.

1 Like

You could also just use the Python bindings since you are writing Python code, and load the model once …