Reason for using softmax activation function only during evaluation

During training, the output obtained from the last hidden layer was directly used to calculate the CTC loss but during inference, a softmax activation function is applied on the output of the last hidden layer before sending the output to the CTC loss function.

tf.nn.ctc_loss applies the softmax internally. The decoder expects the input to already have softmax applied to it.

Ok, got it. Thank you.