Increase the rate of recognizing some special words

Hung_Phung · May 17, 2021, 9:07am

Hi everybody. I have built a Vietnamese model with about WER=0.3, however my end goal is to clearly detect vulgar words. I have read and in the documentation that you will retrain the scorer model if you have a need to recognize medical words, how should this method be done? am I going to insert more vulgar words into vocabulary.txt. As for the model having to add vulgar audio files, is taking out the saved checkpoint files to retrain with only raw audio files is a correct method.

ftyers · May 17, 2021, 4:31pm

Gradulations! You can use the hotwords functionality too.

https://deepspeech.readthedocs.io/en/latest/Python-API.html?highlight=hotwords

Hung_Phung · June 1, 2021, 9:42am

Excuse me! What is “boost”? I can’t understand it. Thank you.

ftyers · June 1, 2021, 6:34pm

boost means how much more do you want to make this word likely to be predicted.