Unable to predict domain-specific keywords perfectly

I have trained the DeepSpeech model on the Hinglish dataset of 2000 hours. Our model predicts general words very correctly but is unable to predict domain-specific keywords perfectly.

So, I want to predict domain-specific keywords correctly, I have some doubts:

  1. How can I add these boosted words (like in Insurance Domain: “Policy”, “Admission”, “Insurance”, “name of patient”, “diagnosis” etc ) to my dataset? Do I need to add it during training at the acoustic level, or to the language model (like in vocab)?
  2. Is there any way of NER based language model along with the current language model?


NER based language model ? Can you be more specific ?

You can augment / tune the language model, that’s likely to be the most effective.

I read a paper to incorporate NER with language model to improve the transcription of Entities, so I thought we can make a separate NER based language model. But I have not found this with Deep Speech.

Would you like to suggest the augment/tuning of language model that is possible, I am not able to figure out how can we tune lm.


Thanks. We have not worked on that topic, so I don’t really have a feedback on that. I’ll have to read the paper. Luckily, Le Mans and Nantes are pretty close to me :slight_smile:

This is documented under data/lm/

Is the dataset public? If yes, could you please post a link? If no, is there a way to obtain it?

Very curious about the accuracy. Could you please share the WER?