Hello, I was wondering, shouldn’t when setting alpha and beta in the scorer function to zero and beam size to 1 the decoder act as a greedy decoder? Or am I missing something basic.
Right now when I do that my results are way bad and different than
the greedy decoder of tensorflow. From what I have seen the decoded sequences are mainly just very shorter (but sort of correct for the amount of audio they decode)…
Also if I increase the beam size (lets say to 512) the transcriptions get better and sort of converge to the greedy decoder output.
You can also simply not provide the LM scorer. I haven’t tested this exact use case, so I don’t know what exactly is going on, but there could be small differences in the implementation of TF’s CTC decoder versus the implementation we use.
yeah, you are absolutely right. Providing no scorer and beam size =1 seems to work fine, similar to greedy (argmax) decoder. So there must be some kind of different scoring happening when you do set the scorer that is not obvious to me