How to update the language model

sagarrokz.999 · January 21, 2019, 12:26pm

Hey i want to update the language model with the domain specific phrases and use the model to make my inferences.
how can i get this done.

lissyx · January 21, 2019, 1:07pm

This is already documented in data/lm/README.md.

sagarrokz.999 · January 29, 2019, 6:27am

hello lissyx
Thank you i was able to create the lm.arpa and lm.binary files

Then for creating the trie files i couldn’t able to find the generate_trie file to execute the command. please help out with this

lissyx · January 29, 2019, 7:11am

Download it, it’s part of native_client.tar.xz

sagarrokz.999 · January 29, 2019, 7:27am

yeah thank you then i followed the steps
This is the command i used
./generate_trie …lm_alphabet.txt /opt/deepspeech/lm.binary /opt/deepspeech/trie

But, I’m getting this error not sure what it is?
ERROR: VectorFst::Write: Write failed:

lissyx · January 29, 2019, 8:02am

No idea either, and it seems your error message is incomplete? Or your ...lm_alphabet.txt is not a valid path maybe ?

sagarrokz.999 · January 29, 2019, 9:12am

sagarrokz.999 · January 29, 2019, 9:13am

The path is correct and the error message that all i got in the console. Please see the image file i sent

lissyx · January 29, 2019, 9:49am

I’m unable to read your image, this I can’t help you.

lissyx · January 29, 2019, 9:52am

You also don’t provide any ls -hal of your sources files nor the destination directory …

sagarrokz.999 · January 29, 2019, 10:46am

Hey the alphabet.txt file which we are passing to generate trie file what does it should have all the alphabets(a-z) or words from the corpus with which we created lm.binary files

lissyx · January 29, 2019, 10:49am

It’s an alphabet file, so it should contain … the alphabet ?

sagarrokz.999 · January 29, 2019, 10:51am

so it basically the lower case alphabets from a-z right?

lissyx · January 29, 2019, 10:57am

No, basically it should cover any character present in your dataset. So any character present in the language model, in this case.

sagarrokz.999 · January 29, 2019, 4:40pm

Thank you lissyx.

One more question how can i get the inferences for the wav files without doing any training…
or
get the text or transcript when u send a audio file.

./deepspeech --model models/output_graph.pbmm --alphabet models/alphabet.txt --lm models/lm.binary --trie models/trie --audio audio_input.wav

with this command it works for only one file. when i give a folder it throws an error

noor_e_emaan11 · February 1, 2019, 2:21pm

How you resolve this error ? @sagarrokz.999
ERROR: VectorFst::Write: Write failed:
Kindly help.

Thank you!

sagarrokz.999 · February 5, 2019, 5:18am

Hey the problem was with the alphabet.txt file i had vocabulary instead of the alphabets in the file. so i got that error.

noor_e_emaan11 · February 12, 2019, 9:03am

You are making language model/trie for which language ?
I still had the same issue. kindly help.
I am trying to generate trie file for the Urdu language.

lissyx · February 12, 2019, 9:31am

Yes, please help yourself and search a bit, this is heavily documented, the Python CLI tool is just here to demo the Python module. You can do multiple inference by writing your own code.

sagarrokz.999 · February 12, 2019, 9:58am

hey i’m doing it for English language.

just check whether you have all the alphabets in it or not. and also for urdu you may need to add zer zabar and pesh as well it’s a guess. i’m not sure

Topic		Replies	Views
Error when run generate trie DeepSpeech	7	1499	April 1, 2019
Fine tune the Language Model DeepSpeech	3	494	December 6, 2019
Creation of language model ( lm.binary , output_graph.tflite and trie file) DeepSpeech	6	705	May 6, 2020
Changing alphabet.txt for the Language Model DeepSpeech	2	2016	January 8, 2019
Issue with Language Model DeepSpeech	11	1035	January 3, 2019

How to update the language model

Related topics