Hi everybody. I have built a Vietnamese model with about WER=0.3, however my end goal is to clearly detect vulgar words. I have read and in the documentation that you will retrain the scorer model if you have a need to recognize medical words, how should this method be done? am I going to insert more vulgar words into vocabulary.txt. As for the model having to add vulgar audio files, is taking out the saved checkpoint files to retrain with only raw audio files is a correct method.
Gradulations! You can use the hotwords functionality too.
https://deepspeech.readthedocs.io/en/latest/Python-API.html?highlight=hotwords
1 Like
Excuse me! What is “boost”? I can’t understand it. Thank you.
boost
means how much more do you want to make this word likely to be predicted.
1 Like