How to update the language model

The path is correct and the error message that all i got in the console. Please see the image file i sent

I’m unable to read your image, this I can’t help you.

You also don’t provide any ls -hal of your sources files nor the destination directory …

Hey the alphabet.txt file which we are passing to generate trie file what does it should have all the alphabets(a-z) or words from the corpus with which we created lm.binary files

It’s an alphabet file, so it should contain … the alphabet ?

1 Like

so it basically the lower case alphabets from a-z right?

No, basically it should cover any character present in your dataset. So any character present in the language model, in this case.

Thank you lissyx.

One more question how can i get the inferences for the wav files without doing any training…
or
get the text or transcript when u send a audio file.

./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav

with this command it works for only one file. when i give a folder it throws an error

How you resolve this error ? @sagarrokz.999
ERROR: VectorFst::Write: Write failed:
Kindly help.

Thank you!

Hey the problem was with the alphabet.txt file i had vocabulary instead of the alphabets in the file. so i got that error.

You are making language model/trie for which language ?
I still had the same issue. kindly help.
I am trying to generate trie file for the Urdu language.

Yes, please help yourself and search a bit, this is heavily documented, the Python CLI tool is just here to demo the Python module. You can do multiple inference by writing your own code.

hey i’m doing it for English language.

just check whether you have all the alphabets in it or not. and also for urdu you may need to add zer zabar and pesh as well it’s a guess. i’m not sure

@lissyx thank you

I sorted it out

I added zeer, zabar etc in my alphabet.txt file but still error is the same.

I made it for Urdu. Make sure u add all urdu characters, including the characters which are not visible.

@ameerhamza.rz , Hi, You developed for how much hours of data ?
Great. You made it.

For demo, trained it on around 2 hours of corpus (PRUS Corpus + Youtube Videos). Now training it on 75 hours RUMI corpus.

Did you get the results?
Accuracy ?