Question about ds_decoder

vasilis · February 25, 2019, 10:54am

Hello, I was wondering, shouldn’t when setting alpha and beta in the scorer function to zero and beam size to 1 the decoder act as a greedy decoder? Or am I missing something basic.

Right now when I do that my results are way bad and different than
the greedy decoder of tensorflow. From what I have seen the decoded sequences are mainly just very shorter (but sort of correct for the amount of audio they decode)…

Also if I increase the beam size (lets say to 512) the transcriptions get better and sort of converge to the greedy decoder output.

reuben · February 25, 2019, 11:11am

You can also simply not provide the LM scorer. I haven’t tested this exact use case, so I don’t know what exactly is going on, but there could be small differences in the implementation of TF’s CTC decoder versus the implementation we use.

vasilis · February 25, 2019, 12:59pm

yeah, you are absolutely right. Providing no scorer and beam size =1 seems to work fine, similar to greedy (argmax) decoder. So there must be some kind of different scoring happening when you do set the scorer that is not obvious to me

Topic		Replies	Views
Explanation of how the Scorer works after predicted transcripts? DeepSpeech	4	879	June 4, 2020
How does the scorer in DeepSpeech 0.7 work? DeepSpeech	0	898	April 30, 2020
CTC beam search only returns one result? DeepSpeech issue	6	677	March 30, 2021
Scorer hyperparameters DeepSpeech	3	892	May 13, 2020
How exactly the decoder (and especially tf.nn.ctc_beam_search_decoder) works? DeepSpeech	9	3895	May 21, 2018

Question about ds_decoder

Related topics