i’m starting a new project about reading assessment on small vocabulary (~1000 words), before digging into deepspeech, i would like to ask if anybody has already tried a subwordpiece (“reading” “assessment” -> “rea ding ass ess ment”) approach with deepspeech? and if you had, any feedback would be welcomed.
thank you for replying, the target is to do some oral reading assessment, meaning that the decoded sequence from the ASR will be compared to the supposed read sequence.
The main problem is that if a word is not correctly read (“assessment” → “assessent”), the decoding with usual language model, will probably give us “assessment”, and we can’t detect the error unless with a probability threshold, or in the best case (maybe with ponderation of LM weight) we will get an OOV.
I read a paper that suggest working on sub words like (ass ess ment) but that involve some rework on labels, and it become much more a “phonetic” labelisation…, so i feel that it s going a bit out of the standard objective of deepspeech.
I hope i ve made my question clearer
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4
Yes. I’m going to ask a dumb question, but what results do you get without any language model at all ? That should get you a more phonetic, raw output.
Hi,
i saw your topic on the forum, for now i trained the model on a character level, i have a good WER about 3.5% on my test set (Dataset is a small vocabulary of about 150 distinct short sentences from children - 50h) so i didn’t explore yet the subword piece solution.
My next step for reading assessment is to analyse inferences performances on specific sets with missreading, it will be at this step (if it doesn’t work well enough) that i will know if i need to try a subword piece approach.
i finally choose to work with rule based language models for each text, showing some promissing results, but the project is freezed because of covid, hoping to restart it this spring
Raw decoding didn’t bring good results, the subword approach still get my favours but i felt it was a bit far of deepspech solution that is grapheme based and that shortcut many of phonemic interpretation,