Extract 20ms transition probabilities matrices from wav/mfcc file


I read papers writing about extracting 20ms transition probability matrices from wav/mfcc file with deep speech’s RNN. I searched trough github issues and this mozilla forum but couldnt find anything, sorry if this was asked before somewhere.

May I ask how this can be done? Is there a direct batch command or something for this?

kind regards


How does those papers related to deepspeech?

MFCC in the graph are extracted here: https://github.com/mozilla/DeepSpeech/blob/master/training/deepspeech_training/util/feeding.py#L42-L45

On the inference side: https://github.com/mozilla/DeepSpeech/blob/8c8b80dc0bc39701d5cab51fa878133e48cdb59e/native_client/deepspeech.cc#L160-L168

inference implem here: https://github.com/mozilla/DeepSpeech/blob/8c8b80dc0bc39701d5cab51fa878133e48cdb59e/native_client/tflitemodelstate.cc#L392-L421
and here: https://github.com/mozilla/DeepSpeech/blob/8c8b80dc0bc39701d5cab51fa878133e48cdb59e/native_client/tfmodelstate.cc#L247-L264

Thank you very much for the quick answers. Ill check them out.

Mainly this paper used deepspeech transition probabilities and also others like VOCA