Hello,
I am trying to use output_graph.pb to generate the logits and tf.nn.ctc_beam_search_decoder() to get the output sequence. But the output generated is really gibberish. On contrary when I use the deepspeech api it transcribes the audio correctly.
Is there any way I can get correct log probablities of decoded outputs using deepspeech api?
How can I use tf.nn.ctc_beam_search_decoder() to get the correct sequence?
Kindly let me know if I am doing anything wrong on any step?
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Itâs complicated to answer your question without more details on the versions of tools you used. Specifically, we switched to another CTC decoder implementation that depends on Softmax being applied. Not applying the softmax it will produce gibberish âŚ
I am using Deepspeech api verison 0.4.1 which gives me correct result.
For my own script which uses output_graph.pb, I am using tensorflow 1.12.1 and the output_graph.pb is of 0.4.1. Also I use the output of âprefix/logits:0â from the graph as input to the tf.nn.ctc_beam_search_decoder() to get the sequence and probablities
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Yes I saw that. I tried using raw_logits:0 as well as using softmax on raw_logits from biRNN, as input to tf.ctc_beam_search_decoder() but output sequence is still not as good as deepspeech api.
I have 2 questions,
Is there any way I can get the log probabilities using deepspeech api?
What modifications needs to be done so that the sequence generated using tf.nn.ctc_beam_search_decoder() is better?