How can we get log probabilities of decoded outputs

pranoothatwar · January 21, 2019, 11:35am

Hello,
I am trying to use output_graph.pb to generate the logits and tf.nn.ctc_beam_search_decoder() to get the output sequence. But the output generated is really gibberish. On contrary when I use the deepspeech api it transcribes the audio correctly.

Is there any way I can get correct log probablities of decoded outputs using deepspeech api?
How can I use tf.nn.ctc_beam_search_decoder() to get the correct sequence?

Kindly let me know if I am doing anything wrong on any step?

lissyx · January 21, 2019, 1:49pm

It’s complicated to answer your question without more details on the versions of tools you used. Specifically, we switched to another CTC decoder implementation that depends on Softmax being applied. Not applying the softmax it will produce gibberish …

pranoothatwar · January 22, 2019, 7:38am

Hello,

Thank you for revert.

I am using Deepspeech api verison 0.4.1 which gives me correct result.
For my own script which uses output_graph.pb, I am using tensorflow 1.12.1 and the output_graph.pb is of 0.4.1. Also I use the output of ‘prefix/logits:0’ from the graph as input to the tf.nn.ctc_beam_search_decoder() to get the sequence and probablities

lissyx · January 22, 2019, 7:44am

Then that’s likely what is wrong, model uses a softmax before the ctcdecoder step: DeepSpeech/DeepSpeech.py at 015551a80cb0454679eeea8f085f272b79e18751 · mozilla/DeepSpeech · GitHub. I’m unsure you can decode properly with the TensorFlow implementation.

pranoothatwar · January 22, 2019, 8:21am

Okay thank you very much ! I’ll check once. Looks like I was using the tf op defined at : DeepSpeech/DeepSpeech.py at 015551a80cb0454679eeea8f085f272b79e18751 · mozilla/DeepSpeech · GitHub

lissyx · January 22, 2019, 9:15am

I don’t see any difference, if you scrollback a little bit you get the line I pointed out, that does a softmax

reuben · January 22, 2019, 12:02pm

There’s a node in the graph called “raw_logits” (see function BiRNN) which you should be able to feed into TensorFlow’s CTC decoder.

pranoothatwar · January 22, 2019, 12:32pm

Yes I saw that. I tried using raw_logits:0 as well as using softmax on raw_logits from biRNN, as input to tf.ctc_beam_search_decoder() but output sequence is still not as good as deepspeech api.

I have 2 questions,

Is there any way I can get the log probabilities using deepspeech api?
What modifications needs to be done so that the sequence generated using tf.nn.ctc_beam_search_decoder() is better?

I have modified scripts by https://github.com/pvanickova/DeepSpeech/blob/master/bin/show_inferred_characters.py by adding tf.nn.beam_search_decoder() in the same .

Topic		Replies	Views
Getting logits as output DeepSpeech	14	1161	August 12, 2020
How exactly the decoder (and especially tf.nn.ctc_beam_search_decoder) works? DeepSpeech	9	3882	May 21, 2018
Can I use pre-trained model with DeepSpeech.py? DeepSpeech	8	3760	December 27, 2019
Output matrix from neural net DeepSpeech	3	789	March 4, 2018
Hey where'd those logits go? DeepSpeech	6	924	October 16, 2019

How can we get log probabilities of decoded outputs

Related topics