Custom lm binary for digits and few set of commands

Hi, I am experimenting with a small custom lm which mostly has digits combination (all digit combinations should be recognized) and few set of non-digit words sentences. (e.g. “all is good”). Both types never occurring together in a sentence. now, lm binaries and trie generated by this vocabulary works fine for non-digits sentences with default tflite model provided for v0.5.1. For digit combinations, I observed that sequences occurring in vocabulary are recognized with high probability, compared to digit sentences not in vocabulary (e.g. “five seven five nine”). Am I missing something here?

Sharing arpa and corresponding lm binary file and trie file. (93.8 KB)

Command used:

~/terminal/kenlm/build/bin/lmplz --text vocabulary.txt --arpa --order 5 --discount_fallback --temp_prefix /tmp/

Generate lm

~/terminal/kenlm/build/bin/build_binary -T -s trie lm.binary

Generate trie

~/terminal/repository/DeepSpeech/generate_trie alphabet.txt lm.binary trie

Are you working with v0.5.1 or are you comparing your results to 0.5.1 ?

I’m working with v0.5.1. I did source build to get deepspeech executable.
Here is the output I get:

$ ./deepspeech --model data/all_combinations/output_graph.pbmm -t --extended --alphabet data/alphabet.txt --lm data/all_combinations/tmp/lm.binary --trie data/all_combinations/tmp/trie --audio ../../learning2/mobile_recorded/02_Jan_testing/1577956542158_denoised.wav

Also, to be noted -t flag doesn’t print mfcc data as it should do. Where should I look for the same to enable mfcc output?

May i ask why ?

I’m not sure what you are referring to, -t gives timing results, and I see cpu_time_overall, so it’s working as intended.

That makes no sense. What output do you want ? MFCC are a transformation of the audio signal used as input of the network.

In deepspeech executable document, -t command has following description:

That’s why I got confused. Clarified now:

I am doing multiple things here:

  • Generate custom lm with limited sentences (digits and few simple commands)
  • See how deepspeech does recognition in steps ( and hence mfcc and intermediate outputs)
  • Tinker with ctc decoder code and check how lm is changing my results.

Ok, you got confused because it just means we output the time it took for performing mfcc + inference. Not we output MFCC.

Yes. I have made few changes in ctc decoder (as of now just dummy prints), but I don’t see the changes getting reflected even after pip uninstall ds_ctcdecoder and then pip install native_client/ctcdecode/dist/ds_ctcdecoder-0.5.1-cp37-cp37m-macosx_10_14_x86_64.whl. Is deepspeech using some cache version of ctc decoder? If yes, how to clear it?

No, there is no such cache. Please triple check your build / install steps.