specs:
OS: Ubuntu 18.04
DeepSpeech: 0.6.1
Local clone of the git: HEAD detached at v0.6.1
Have i written custom code: No, except i changed the url in data/lm/generate_lm.py and changed the 500k words to 600k.
I can build my lm.binary without a problem, i also followed the build steps that i found in here.
When trying to generate my Trie like this:
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/model.cc:70 in lm::ngram::detail::GenericModel<Search, VocabularyT>::GenericModel(const char*, const lm::ngram::Config&) [with Search = lm::ngram::trie::TrieSearch<lm::ngram::SeparatelyQuantize, lm::ngram::trie::ArrayBhiksha>; VocabularyT = lm::ngram::SortedVocabulary] threw FormatLoadException because `new_config.enumerate_vocab && !parameters.fixed.has_vocabulary'.
The decoder requested all the vocabulary strings, but this binary file does not have them. You may need to rebuild the binary file with an updated version of build_binary.
Aborted (core dumped)
I’m not sure what has caused this problem (i’m assuming it is not a problem with the deepspeech code), but could someone help me?
My understanding is that from 0.7.0 onward the trie and lm will be replaced by a scorer file, but since i’m working on the head of 0.6.1 I assume this also not is the problem? Maybe i cloned the wrong version of kenlm from github?
ps: running /PATH/native_client/generate_trie without input, gives me the following output: Usage: ../tensorflow/bazel-bin/native_client/generate_trie <alphabet> <lm_model> <trie_path>
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
2
Why do you rebuild generate_trie when we provide it in native_client.tar.xz ? This adds one layer of uncertainty.
Could you ls -hal all the vocab / lm files ? I suspect it was improperly generated.
I get native_client and use generate_trie to generate trie file. however throws error:
terminate called after throwing an instance of 'lm::FormatLoadException' what(): native_client/kenlm/lm/model.cc:70 in lm::ngram::detail::GenericModel<Search, VocabularyT>::GenericModel(const char*, const lm::ngram::Config&) [with Search = lm::ngram::trie::TrieSearch<lm::ngram::SeparatelyQuantize, lm::ngram::trie::ArrayBhiksha>; VocabularyT = lm::ngram::SortedVocabulary] threw FormatLoadException because new_config.enumerate_vocab && !parameters.fixed.has_vocabulary'.
The decoder requested all the vocabulary strings, but this binary file does not have them. You may need to rebuild the binary file with an updated version of build_binary.
Aborted (core dumped)
`
how to fix this?
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
7
Have you properly created the LM files ? Can you reproduce with released LM files ? We need more context …
It’s also very irritating for everyone that you hijack existing thread to re-ask the same question as the poster of the original thread.
If you had paid attention to the thread you are replying, you would have found the solution.
thanks you! I am working on mandarin asr. I have create lm.arpa and lm.binary files using generate_lm.py with kenlm tools. I found that the kenlm tool I am using is different with kenlm in native_client/kenlm. The kenlm tool I am using is install from github with “pip install https://github.com/kpu/kenlm/archive/master.zip” cmd. Also I found that in source codes of DeepSpeech v0.6.1, kenlm can not be compiled correctly with
mkdir -p build
cd build
cmake …
make -j 4
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
9
You should not use those sources, but build from upstream, as we document.