I have trained the DeepSpeech model on the Hinglish dataset of 2000 hours. Our model predicts general words very correctly but is unable to predict domain-specific keywords perfectly.
So, I want to predict domain-specific keywords correctly, I have some doubts:
- How can I add these boosted words (like in Insurance Domain: “Policy”, “Admission”, “Insurance”, “name of patient”, “diagnosis” etc ) to my dataset? Do I need to add it during training at the acoustic level, or to the language model (like in vocab)?
- Is there any way of NER based language model along with the current language model?