How can we get log probabilities of decoded outputs

Hello,
I am trying to use output_graph.pb to generate the logits and tf.nn.ctc_beam_search_decoder() to get the output sequence. But the output generated is really gibberish. On contrary when I use the deepspeech api it transcribes the audio correctly.

  1. Is there any way I can get correct log probablities of decoded outputs using deepspeech api?

  2. How can I use tf.nn.ctc_beam_search_decoder() to get the correct sequence?

Kindly let me know if I am doing anything wrong on any step?

1 Like

It’s complicated to answer your question without more details on the versions of tools you used. Specifically, we switched to another CTC decoder implementation that depends on Softmax being applied. Not applying the softmax it will produce gibberish …

Hello,

Thank you for revert.

  1. I am using Deepspeech api verison 0.4.1 which gives me correct result.

  2. For my own script which uses output_graph.pb, I am using tensorflow 1.12.1 and the output_graph.pb is of 0.4.1. Also I use the output of ‘prefix/logits:0’ from the graph as input to the tf.nn.ctc_beam_search_decoder() to get the sequence and probablities

Then that’s likely what is wrong, model uses a softmax before the ctcdecoder step: DeepSpeech/DeepSpeech.py at 015551a80cb0454679eeea8f085f272b79e18751 · mozilla/DeepSpeech · GitHub. I’m unsure you can decode properly with the TensorFlow implementation.

Okay thank you very much ! I’ll check once. Looks like I was using the tf op defined at : DeepSpeech/DeepSpeech.py at 015551a80cb0454679eeea8f085f272b79e18751 · mozilla/DeepSpeech · GitHub

I don’t see any difference, if you scrollback a little bit you get the line I pointed out, that does a softmax :slight_smile:

There’s a node in the graph called “raw_logits” (see function BiRNN) which you should be able to feed into TensorFlow’s CTC decoder.

Yes I saw that. I tried using raw_logits:0 as well as using softmax on raw_logits from biRNN, as input to tf.ctc_beam_search_decoder() but output sequence is still not as good as deepspeech api.

I have 2 questions,

  1. Is there any way I can get the log probabilities using deepspeech api?
  2. What modifications needs to be done so that the sequence generated using tf.nn.ctc_beam_search_decoder() is better?

I have modified scripts by https://github.com/pvanickova/DeepSpeech/blob/master/bin/show_inferred_characters.py by adding tf.nn.beam_search_decoder() in the same .