ValueError

https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html

Same, this is all in TRAINING doc and data/lm

I went through DeepSpeech’s Training your own model and also data/lm’s readme files many times but it doesn’t give any explanation as to why you need to build .arpa and .binary. The DeepSpeech’s training section is also quiet about this part. How do you link those 2 pieces of guide together to provide a better understanding on what’s going on when you have new alphabet to add in?

Decoding requires knowing the alphabet. Your vocabulary gets translated into the lm.binary file. So changing the alphabet will mean your lm.binary is invalid.

Where exactly it is mentioned inhttps://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html , that you need create new words.arpa and lm.binary.

Can we avoid getting the topic in three different directions here ? Rebuilding the LM mentions generating the trie file.

PLEASE PLEASE PLEASE IF YOU FIND THE DOC UNACCURATE, FILE ISSUES AND EXPLAIN WHAT YOU DON’T UNDERSTAND / MISS.

WE CAN’T GET INTO YOUR HEAD.

Agree. I edited my answer.

When I am reading https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html I could not find reference to words.arpa and lm.binary, maybe this should be added, because I found about it on different web pages.

Well, it is obvious to us and to many other people who have not had any problem doing their training that you need to build your own language model. So please file issue on Github explaining exactly what you miss or how you would word it. Better even if you send a patch adding the missing documentation.

Let me repeat: we cannot get into your head. We are deep down into the project, some things that are so obvious to us we can’t even know why it’s complicated to others. This is no arrogance or pushback.

If people don’t tell us that they don’t understand the current doc or else, we can’t improve.

Hi lissyx,

Does lm.binary applies to any language?
Reason I’m asking is I’m working on a zh-HK and id language. How would running python generate_lm.py distinguish which language that I’m working on?

Thanks

Yes. Please look at the doc explaining acoutic / language model. You need LM to perform decoding, whatever the language.

That question makes no sense to me. This code does not care about your language, it will just build a file that is being used by the decoder to help decoding acoustic output.

Where is this document on acoustic/language model?

And thanks for the reply on

it’s everywhere … in the original paper etc.

@CheahHeng_Tan It seems you are searching for a lot of links / references, can you please explain what you are working on ?

@lissyx as soon I will develop model I will make guidlines how to train model for braindead people :))

Returning to lm.binary.

I want to run ./DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --test_files …/data/CV/en/clips/test.csv,
I found in another tutorial that guy is trying to add --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary , but when I see newest DeepSeech version does not have such FLAGS.

Another guide TUTORIAL : How I trained a specific french model to control my robot
as well states to use same flag --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary

Please python DeepSpeech.py --helpfull

Tutorial might be for outdated versions, we can’t do anything about that.

I would rather that you pinpoint / send PR to improve the documentation rather than doing another this-is-my-tutorial that will get outdated very soon.

I would like to talk, we can get intouche via whatsup +37127728463

can you share how you imported LM.binary in your model ?

I went throught DeepSpeech --helpfull no flag relates to binary.

I found only flag LOAD allows to use ‘int’ for initializing a fresh model.

@lissyx I would appreciate you help as well

I have already been helping you a lot. But you really need to be much more specific, I can’t keep with everybody doing their own stuff and cross-posting everywhere.

This is for 0.6.1, I can’t help you if you don’t mention properly your setup and/or if you keep changing versions.

@Stanislavs_Davidovics It seems that you are not using 0.6.1 version of Mozilla Deepspeech.

python 0.6.1_DeepSpeech/DeepSpeech/DeepSpeech.py --helpfull

–lm_binary_path: path to the language model binary file created with KenLM
(default: ‘data/lm/lm.binary’)
–lm_trie_path: path to the language model trie file created with native_client/generate_trie
(default: ‘data/lm/trie’)

@pj123 hi man :slight_smile: have you trained a model? what about your model`s accuracy ? by the way ,witch language model you are training ? im training korean ,but i get pretty low accuracy. could you share some experience? thanks!