Thanks for the reply and the info links. I will read the relevant papers in detail.
Coming back to my tests. The first result contains a letter sequence illegal to the language model:
" trialastruodle"
where the expected words are
"try our strudel"
I assume that this indicates that the decoder has too low confidence in that part of the audio to produce the correct words. To me this is both a bad point and a good point. A bad point in that the decoder should return words only legal to the language model. I also view this behavior as a good point in that it indicates that for that part of the audio the decoder has lower decoding confidence - a useful piece of information. I can use a post processor which checks if there are such illegal letter sequences. If yes, the post processor can apply sequence similarity to transform them into word sequences legal to the language model, while assigning a lower confidence for the next processing stages – NLU, dialogue management and so on.