Extract 20ms transition probabilities matrices from wav/mfcc file

spaceharry · March 2, 2021, 7:25pm

Hi

I read papers writing about extracting 20ms transition probability matrices from wav/mfcc file with deep speech’s RNN. I searched trough github issues and this mozilla forum but couldnt find anything, sorry if this was asked before somewhere.

May I ask how this can be done? Is there a direct batch command or something for this?

kind regards

phil

lissyx · March 2, 2021, 7:37pm

How does those papers related to deepspeech?

lissyx · March 2, 2021, 7:38pm

MFCC in the graph are extracted here: https://github.com/mozilla/DeepSpeech/blob/master/training/deepspeech_training/util/feeding.py#L42-L45

lissyx · March 2, 2021, 7:39pm

On the inference side: https://github.com/mozilla/DeepSpeech/blob/8c8b80dc0bc39701d5cab51fa878133e48cdb59e/native_client/deepspeech.cc#L160-L168

lissyx · March 2, 2021, 7:41pm

inference implem here: https://github.com/mozilla/DeepSpeech/blob/8c8b80dc0bc39701d5cab51fa878133e48cdb59e/native_client/tflitemodelstate.cc#L392-L421
and here: https://github.com/mozilla/DeepSpeech/blob/8c8b80dc0bc39701d5cab51fa878133e48cdb59e/native_client/tfmodelstate.cc#L247-L264

spaceharry · March 2, 2021, 7:53pm

Thank you very much for the quick answers. Ill check them out.

Mainly this paper used deepspeech transition probabilities and also others like VOCA