Before the logits are sent to ctc_beam_search_decoder on line , I do print(np.shape(logits))
. I get (129, 29) or (137, 29) etc. I know 129 or 137 is the num_strides
. I assume 29 should be for the 29 possible characters. But in my alphabet.txt there are 26 alphabets, space and apostrophe a total of 28. So what does 29 mean?
Answering myself, I observed the highest probability is never on the 29th index, so assumed its just a fake thingy. So now
logits = np.squeeze(logits)
#print(num_strides)
#print(np.shape(logits))
a = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","'"," "]
raw = ""
char_idxs = np.argmax(logits, axis=1)
print(char_idxs)
for char_id in char_idxs:
raw += a[char_id-1]
print(raw)
scorer = Scorer(FLAGS.lm_alpha, FLAGS.lm_beta,
FLAGS.lm_binary_path, FLAGS.lm_trie_path,
Config.alphabet)
prints:
why should o nee hh a l l t o onn the ww a y
and the decoded output is why should one hall to on the way
Itās a āblankā, not a space, required for CTC.
1 Like
Thanks, so the 29 items are:
a = [" ","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","'","Ļµ"]
where Ļµ is the CTC āblankā
Hereās a greedy best path algorithm that handles the blank index and merges repeated characters (merge_repeated_
defaults to true): https://github.com/tensorflow/tensorflow/blob/6612da89516247503f03ef76e974b51a434fb52e/tensorflow/core/util/ctc/ctc_decoder.h#L95-L107