How to update the language model

Hey i want to update the language model with the domain specific phrases and use the model to make my inferences.
how can i get this done.

This is already documented in data/lm/README.md.

hello lissyx
Thank you i was able to create the lm.arpa and lm.binary files

Then for creating the trie files i couldn’t able to find the generate_trie file to execute the command. please help out with this

Download it, it’s part of native_client.tar.xz

yeah thank you then i followed the steps
This is the command i used
./generate_trie …lm_alphabet.txt /opt/deepspeech/lm.binary /opt/deepspeech/trie

But, I’m getting this error not sure what it is?
ERROR: VectorFst::Write: Write failed:

No idea either, and it seems your error message is incomplete? Or your ...lm_alphabet.txt is not a valid path maybe ?

The path is correct and the error message that all i got in the console. Please see the image file i sent

I’m unable to read your image, this I can’t help you.

You also don’t provide any ls -hal of your sources files nor the destination directory …

Hey the alphabet.txt file which we are passing to generate trie file what does it should have all the alphabets(a-z) or words from the corpus with which we created lm.binary files

It’s an alphabet file, so it should contain … the alphabet ?

1 Like

so it basically the lower case alphabets from a-z right?

No, basically it should cover any character present in your dataset. So any character present in the language model, in this case.

Thank you lissyx.

One more question how can i get the inferences for the wav files without doing any training…
or
get the text or transcript when u send a audio file.

./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav

with this command it works for only one file. when i give a folder it throws an error

How you resolve this error ? @sagarrokz.999
ERROR: VectorFst::Write: Write failed:
Kindly help.

Thank you!

Hey the problem was with the alphabet.txt file i had vocabulary instead of the alphabets in the file. so i got that error.

You are making language model/trie for which language ?
I still had the same issue. kindly help.
I am trying to generate trie file for the Urdu language.

Yes, please help yourself and search a bit, this is heavily documented, the Python CLI tool is just here to demo the Python module. You can do multiple inference by writing your own code.

hey i’m doing it for English language.

just check whether you have all the alphabets in it or not. and also for urdu you may need to add zer zabar and pesh as well it’s a guess. i’m not sure