Hi,
The current output of speech recognition has a few really long sequence of characters. I know that this is a known bug that is currently being worked on.
I do also know that the issue has been identified to be due to some words not being in the vocabulary. If I have a few transcripts representing my text, what is the best way to pass this information to the system?
- Building an lm.binary and a trie out using (my texts + common voice texts + other text from the internet (e.g. wikipedia))
- Spell correct the output of the transcription using a spell correction algorithm.
(Any recommendations for libraries that do spell correction well. Underneath this is yet another language model, so,I am guessing that the right thing to do is to fix the original language model)
I have a feeling that the acoustic model is working well and is not to blame here. If I read some parts of the transcripts which are particularly bad, the text there does sound like the true word being spoken.