I have built an Arabic model which runs just fine. The output is accepted in most cases; WER=0.1.
In my application I need to strict the output to a list. Maybe to reach WER=0.01 Example use case:
App: Do you accept the agreement?
User-voice: Yes / No / Read (Give this list to the scorer)
App: What do you want to order?
User-voice: Breakfast / Lunch / Dinner (Give this list to the scorer)
…
…
And the business logic continues, and the dialog goes on.
How can I implement such functionality with the current scorer?
Correct, it increases the probability of that. You can try making a closed-vocabulary language model/scorer and using that. Just a file with words in and then run the generate_scorer. But I don’t think there is a way to guarantee that you will only get a set of words out.
If you wanted to do that you could probably build a multiclass classifier on top of the softmax output of the acoustic model trained for your specific vocabulary. But I’d suggest trying the simpler stuff first.