ValueError

Stanislavs_Davidovics · March 19, 2020, 9:12am

Where exactly it is mentioned inhttps://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html , that you need create new words.arpa and lm.binary.

lissyx · March 19, 2020, 9:11am

Can we avoid getting the topic in three different directions here ? Rebuilding the LM mentions generating the trie file.

PLEASE PLEASE PLEASE IF YOU FIND THE DOC UNACCURATE, FILE ISSUES AND EXPLAIN WHAT YOU DON’T UNDERSTAND / MISS.

WE CAN’T GET INTO YOUR HEAD.

Stanislavs_Davidovics · March 19, 2020, 9:17am

Agree. I edited my answer.

When I am reading https://deepspeech.readthedocs.io/en/v0.6.1/TRAINING.html I could not find reference to words.arpa and lm.binary, maybe this should be added, because I found about it on different web pages.

lissyx · March 19, 2020, 9:20am

Well, it is obvious to us and to many other people who have not had any problem doing their training that you need to build your own language model. So please file issue on Github explaining exactly what you miss or how you would word it. Better even if you send a patch adding the missing documentation.

Let me repeat: we cannot get into your head. We are deep down into the project, some things that are so obvious to us we can’t even know why it’s complicated to others. This is no arrogance or pushback.

If people don’t tell us that they don’t understand the current doc or else, we can’t improve.

CheahHeng_Tan · March 19, 2020, 9:22am

Hi lissyx,

Does lm.binary applies to any language?
Reason I’m asking is I’m working on a zh-HK and id language. How would running python generate_lm.py distinguish which language that I’m working on?

Thanks

lissyx · March 19, 2020, 9:29am

Yes. Please look at the doc explaining acoutic / language model. You need LM to perform decoding, whatever the language.

That question makes no sense to me. This code does not care about your language, it will just build a file that is being used by the decoder to help decoding acoustic output.

CheahHeng_Tan · March 19, 2020, 9:36am

Where is this document on acoustic/language model?

And thanks for the reply on

lissyx · March 19, 2020, 9:39am

it’s everywhere … in the original paper etc.

lissyx · March 19, 2020, 9:46am

@CheahHeng_Tan It seems you are searching for a lot of links / references, can you please explain what you are working on ?

Stanislavs_Davidovics · March 19, 2020, 9:53am

@lissyx as soon I will develop model I will make guidlines how to train model for braindead people :))

Returning to lm.binary.

I want to run ./DeepSpeech.py --train_files …/data/CV/en/clips/train.csv --test_files …/data/CV/en/clips/test.csv,
I found in another tutorial that guy is trying to add --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary , but when I see newest DeepSeech version does not have such FLAGS.

Another guide TUTORIAL : How I trained a specific french model to control my robot
as well states to use same flag --lm_trie_path data/lm-test/trie --lm_binary_path data/lm-test/lm.binary

lissyx · March 19, 2020, 10:00am

Please python DeepSpeech.py --helpfull

Tutorial might be for outdated versions, we can’t do anything about that.

I would rather that you pinpoint / send PR to improve the documentation rather than doing another this-is-my-tutorial that will get outdated very soon.

Stanislavs_Davidovics · March 20, 2020, 11:40am

I would like to talk, we can get intouche via whatsup +37127728463

Stanislavs_Davidovics · March 20, 2020, 11:44am

can you share how you imported LM.binary in your model ?

I went throught DeepSpeech --helpfull no flag relates to binary.

I found only flag LOAD allows to use ‘int’ for initializing a fresh model.

@lissyx I would appreciate you help as well

lissyx · March 20, 2020, 11:46am

I have already been helping you a lot. But you really need to be much more specific, I can’t keep with everybody doing their own stuff and cross-posting everywhere.

This is for 0.6.1, I can’t help you if you don’t mention properly your setup and/or if you keep changing versions.

pj123 · March 20, 2020, 12:06pm

@Stanislavs_Davidovics It seems that you are not using 0.6.1 version of Mozilla Deepspeech.

python 0.6.1_DeepSpeech/DeepSpeech/DeepSpeech.py --helpfull
…
–lm_binary_path: path to the language model binary file created with KenLM
(default: ‘data/lm/lm.binary’)
–lm_trie_path: path to the language model trie file created with native_client/generate_trie
(default: ‘data/lm/trie’)
…

zhangpeng_K · March 23, 2020, 2:43am

@pj123 hi man have you trained a model? what about your model`s accuracy ? by the way ,witch language model you are training ? im training korean ,but i get pretty low accuracy. could you share some experience? thanks!