Where exactly it is mentioned inhttps://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html , that you need create new words.arpa and lm.binary.
Can we avoid getting the topic in three different directions here ? Rebuilding the LM mentions generating the trie file.
PLEASE PLEASE PLEASE IF YOU FIND THE DOC UNACCURATE, FILE ISSUES AND EXPLAIN WHAT YOU DONāT UNDERSTAND / MISS.
WE CANāT GET INTO YOUR HEAD.
Agree. I edited my answer.
When I am reading https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html I could not find reference to words.arpa and lm.binary, maybe this should be added, because I found about it on different web pages.
Well, it is obvious to us and to many other people who have not had any problem doing their training that you need to build your own language model. So please file issue on Github explaining exactly what you miss or how you would word it. Better even if you send a patch adding the missing documentation.
Let me repeat: we cannot get into your head. We are deep down into the project, some things that are so obvious to us we canāt even know why itās complicated to others. This is no arrogance or pushback.
If people donāt tell us that they donāt understand the current doc or else, we canāt improve.
Hi lissyx,
Does lm.binary applies to any language?
Reason Iām asking is Iām working on a zh-HK and id language. How would running python generate_lm.py distinguish which language that Iām working on?
Thanks
Yes. Please look at the doc explaining acoutic / language model. You need LM to perform decoding, whatever the language.
That question makes no sense to me. This code does not care about your language, it will just build a file that is being used by the decoder to help decoding acoustic output.
Where is this document on acoustic/language model?
And thanks for the reply on
itās everywhere ā¦ in the original paper etc.
@CheahHeng_Tan It seems you are searching for a lot of links / references, can you please explain what you are working on ?
@lissyx as soon I will develop model I will make guidlines how to train model for braindead people :))
Returning to lm.binary.
I want to run ./DeepSpeech.py --train_files ā¦/data/CV/en/clips/train.csv --test_files ā¦/data/CV/en/clips/test.csv,
I found in another tutorial that guy is trying to add --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary , but when I see newest DeepSeech version does not have such FLAGS.
Another guide TUTORIAL : How I trained a specific french model to control my robot
as well states to use same flag --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary
Please python DeepSpeech.py --helpfull
Tutorial might be for outdated versions, we canāt do anything about that.
I would rather that you pinpoint / send PR to improve the documentation rather than doing another this-is-my-tutorial that will get outdated very soon.
I would like to talk, we can get intouche via whatsup +37127728463
can you share how you imported LM.binary in your model ?
I went throught DeepSpeech --helpfull no flag relates to binary.
I found only flag LOAD allows to use āintā for initializing a fresh model.
@lissyx I would appreciate you help as well
I have already been helping you a lot. But you really need to be much more specific, I canāt keep with everybody doing their own stuff and cross-posting everywhere.
This is for 0.6.1, I canāt help you if you donāt mention properly your setup and/or if you keep changing versions.
@Stanislavs_Davidovics It seems that you are not using 0.6.1 version of Mozilla Deepspeech.
python 0.6.1_DeepSpeech/DeepSpeech/DeepSpeech.py --helpfull
ā¦
ālm_binary_path: path to the language model binary file created with KenLM
(default: ādata/lm/lm.binaryā)
ālm_trie_path: path to the language model trie file created with native_client/generate_trie
(default: ādata/lm/trieā)
ā¦
@pj123 hi man have you trained a model? what about your model`s accuracy ? by the way ,witch language model you are training ? im training korean ,but i get pretty low accuracy. could you share some experience? thanks!