Raw logit shape

Abhinav · April 18, 2019, 12:07pm

Before the logits are sent to ctc_beam_search_decoder on line , I do print(np.shape(logits)). I get (129, 29) or (137, 29) etc. I know 129 or 137 is the num_strides. I assume 29 should be for the 29 possible characters. But in my alphabet.txt there are 26 alphabets, space and apostrophe a total of 28. So what does 29 mean?

Abhinav · April 18, 2019, 12:26pm

Answering myself, I observed the highest probability is never on the 29th index, so assumed its just a fake thingy. So now

logits = np.squeeze(logits)
#print(num_strides)
#print(np.shape(logits))
a = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","'"," "]
raw = ""
char_idxs = np.argmax(logits, axis=1)
print(char_idxs)
for char_id in char_idxs:
    raw += a[char_id-1]
print(raw)

scorer = Scorer(FLAGS.lm_alpha, FLAGS.lm_beta,
                FLAGS.lm_binary_path, FLAGS.lm_trie_path,
                Config.alphabet)

prints:

why should o nee hh a l l t o onn the ww a y
and the decoded output is why should one hall to on the way

kdavis · April 18, 2019, 12:48pm

It’s a “blank”, not a space, required for CTC.

Abhinav · April 18, 2019, 1:07pm

Thanks, so the 29 items are:
a = [" ","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","'","ϵ"]

where ϵ is the CTC ‘blank’

reuben · April 18, 2019, 1:37pm

Here’s a greedy best path algorithm that handles the blank index and merges repeated characters (merge_repeated_ defaults to true): https://github.com/tensorflow/tensorflow/blob/6612da89516247503f03ef76e974b51a434fb52e/tensorflow/core/util/ctc/ctc_decoder.h#L95-L107