Can I use pre-trained model with DeepSpeech.py?

Kim · December 18, 2017, 8:32am

Hi,

Please help me figure out how to use the pre-trained model with DeepSpeech.py.

I tried to pass the files from pre-trained models to DeepSpeech.py’s corresponding arguments:

–decoder_library_path
–alphabet_config_path
–lm_binary_path
–lm_trie_path

But I got error saying:

tensorflow.python.framework.errors_impl.NotFoundError: deepspeech-0.1.0-models/models/output_graph.pb: invalid ELF header

I’m not sure how to fix this.

The pre-trained models worked fine with deepspeech command line tool.

My goal is to get the output matrix instead of the text output, is there a work around for this?

Thank you.

kdavis · December 18, 2017, 9:44am

To clarify what do you mean by “output matrix”?

reuben · December 18, 2017, 10:32am

You can’t use the released model with DeepSpeech.py unless you made significant changes to it. There’s no command line parameter that expects a frozen model to be passed in. From the error, it looks like maybe you passed --decoder_library_path models/models/output_graph.pb, when you should be passing the path to libctc_decoder_with_kenlm.so to it.

yv001 · December 18, 2017, 11:25am

My guess would be that Kim is looking for the probabilities of each character rather than the most likely character (before the language model is applied), something like this https://cdn-images-1.medium.com/max/1000/1*d1ktMdOnFOJRKKyjFP6sqQ.png.

If that’s the case, I’d be interested in it too (e.g. as a param of the commandline deepspeech).

Kim · December 18, 2017, 9:28pm

Sorry about the confusion. Like yv001 said, by output matrix I meant the probabilities of each character.

kdavis · December 19, 2017, 10:28am

This would require writing a custom C++ operator that modifies our current custom C++ operator for CTC decoding[1] and would be a significant amount of work.

reuben · December 19, 2017, 10:42am

You don’t necessarily have to use DeepSpeech.py to access the logits, you could also write some code to load the release model using the TensorFlow API, then fetch the “logits” node. You could try modifying this code to load from the frozen graph instead of restoring a checkpoint: https://github.com/mozilla/DeepSpeech/blob/99d0c311a3e6108ed835fa38e4680ba7b3744fad/DeepSpeech.py#L1748-L1775

The relevant TensorFlow API is tf.import_graph_def.

yv001 · January 9, 2018, 10:16am

A debugging script that shows most likely characters and their probabilities can be found here:
https://github.com/pvanickova/DeepSpeech/blob/master/bin/show_inferred_characters.py

It loads a frozen graph, runs inference on given wav, model and alphabet, softmaxes the logits to get the probabilities and displays most likely characters and their probabilities, “-” is used for blank. Each character prediction is on a new line so the predicted text can be read in columns.

E.g. “cat” string could have these character predictions:
c k - (0.999957) (1.62438e-05) (6.99057e-06)
a - (0.999998) (1.05044e-06) (4.43978e-07)
t d - (0.999999) (4.27885e-07) (1.09088e-07)

This is how the script is run:

python3 ./bin/show_inferred_characters.py --input-file "../data/my.wav" --model-file ../data/models/output_graph.pb --alphabet-file ../data/models/alphabet.txt --predicted-character-count 3

If it looks useful enough for others, give me a shout and I’ll create a pull request.

yv001 · December 27, 2019, 3:12pm

The script above has been updated to work with 0.6.0 frozen deepspeech pb models.