I am trying to get Deepspeech to recognize a set number of phrases. I have been able to install KenLM but not to get it working. If anyone is able to consult on this project (on a free or paid basis) I would be grateful to hear from you.
Here you can find a tool for generating language models (scorer files) on base of Oscar data. You can take it as a starting point for your solution with a limited collection of phrases. I am about to extend it towards other data sources, more languages and better handling of numerical phrases.
The .computescript has all the necessary steps. I think the env-var exporting part is optional as they default to locations in the project root. You can use bin/genlm --help in project root for getting all the options.
Sorry, let me be a little more verbose…
I have been able to install KenLM by following the instructions here: https://kheafield.com/code/kenlm/
But, I do not understand how to use this tool in conjunction with DeepSpeech to recognize from a restricted number of sentences:
For example:
Hello, how are you?
I am fine, how are you?
I go shopping at the weekend.
Have you ever been to Paris?
I wouldn’t do that if I were you.
Ok. Let’s say these 5 sentences are my corpus.
Then how would I use KenLM in conjunction with DeepSpeech to recognize what the user says as more than likely being one of the sentences in this corpus?
I understand this is probably a newbie question…
You basically start a new txt file and put just those sentences in there. If the alphabet.txt is lowercase without punctuation, transform your input that way. Then run the kenlm instructions with an order of 1 (or 0) as this is really tiny. Your scorer will be really small.
Alternatively run the inference without an empty scorer location and you’ll get the output of the neural net before checking the dictionary. That might be more useful for language learners.
Sorry, meant run without a scorer by giving an empty argument. Saw it here in the forum the other day. This will output many single letters like “hheelllo pauuull”.
Thank you for this information… but I am wondering whether I can ask for a little more detail… ideally a list of instructions that I would type into bash… to get from my list of sentences to a scorer…