Hi Lissyx,
thank you for replying, the target is to do some oral reading assessment, meaning that the decoded sequence from the ASR will be compared to the supposed read sequence.
The main problem is that if a word is not correctly read (“assessment” -> “assessent”), the decoding with usual language model, will probably give us “assessment”, and we can’t detect the error unless with a probability threshold, or in the best case (maybe with ponderation of LM weight) we will get an OOV.
I read a paper that suggest working on sub words like (ass ess ment) but that involve some rework on labels, and it become much more a “phonetic” labelisation…, so i feel that it s going a bit out of the standard objective of deepspeech.
I hope i ve made my question clearer