How to update the language model


(Sagarrokz 999) #1

Hey i want to update the language model with the domain specific phrases and use the model to make my inferences.
how can i get this done.


(Lissyx) #2

This is already documented in data/lm/README.md.


(Sagarrokz 999) #3

hello lissyx
Thank you i was able to create the lm.arpa and lm.binary files

Then for creating the trie files i couldn’t able to find the generate_trie file to execute the command. please help out with this


(Lissyx) #4

Download it, it’s part of native_client.tar.xz


(Sagarrokz 999) #5

yeah thank you then i followed the steps
This is the command i used
./generate_trie …lm_alphabet.txt /opt/deepspeech/lm.binary /opt/deepspeech/trie

But, I’m getting this error not sure what it is?
ERROR: VectorFst::Write: Write failed:


(Lissyx) #6

No idea either, and it seems your error message is incomplete? Or your ...lm_alphabet.txt is not a valid path maybe ?


(Sagarrokz 999) #7


(Sagarrokz 999) #8

The path is correct and the error message that all i got in the console. Please see the image file i sent


(Lissyx) #9

I’m unable to read your image, this I can’t help you.


(Lissyx) #10

You also don’t provide any ls -hal of your sources files nor the destination directory …


(Sagarrokz 999) #11

Hey the alphabet.txt file which we are passing to generate trie file what does it should have all the alphabets(a-z) or words from the corpus with which we created lm.binary files


(Lissyx) #12

It’s an alphabet file, so it should contain … the alphabet ?


(Sagarrokz 999) #13

so it basically the lower case alphabets from a-z right?


(Lissyx) #14

No, basically it should cover any character present in your dataset. So any character present in the language model, in this case.


(Sagarrokz 999) #15

Thank you lissyx.

One more question how can i get the inferences for the wav files without doing any training…
or
get the text or transcript when u send a audio file.

./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav

with this command it works for only one file. when i give a folder it throws an error


(Hafsa Farooq) #16

How you resolve this error ? @sagarrokz.999
ERROR: VectorFst::Write: Write failed:
Kindly help.

Thank you!


(Sagarrokz 999) #17

Hey the problem was with the alphabet.txt file i had vocabulary instead of the alphabets in the file. so i got that error.


(Hafsa Farooq) #18

You are making language model/trie for which language ?
I still had the same issue. kindly help.
I am trying to generate trie file for the Urdu language.


(Lissyx) #19

Yes, please help yourself and search a bit, this is heavily documented, the Python CLI tool is just here to demo the Python module. You can do multiple inference by writing your own code.


(Sagarrokz 999) #20

hey i’m doing it for English language.

just check whether you have all the alphabets in it or not. and also for urdu you may need to add zer zabar and pesh as well it’s a guess. i’m not sure