specs:
OS: Ubuntu 18.04
DeepSpeech: 0.6.1
Local clone of the git: HEAD detached at v0.6.1
Have i written custom code: No, except i changed the url in data/lm/generate_lm.py and changed the 500k words to 600k.
I can build my lm.binary without a problem, i also followed the build steps that i found in here.
When trying to generate my Trie like this:
…/tensorflow/bazel-bin/native_client/generate_trie alphabet.txt lm.binary trie
it gives me the following error:
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/model.cc:70 in lm::ngram::detail::GenericModel<Search, VocabularyT>::GenericModel(const char*, const lm::ngram::Config&) [with Search = lm::ngram::trie::TrieSearch<lm::ngram::SeparatelyQuantize, lm::ngram::trie::ArrayBhiksha>; VocabularyT = lm::ngram::SortedVocabulary] threw FormatLoadException because `new_config.enumerate_vocab && !parameters.fixed.has_vocabulary'.
The decoder requested all the vocabulary strings, but this binary file does not have them. You may need to rebuild the binary file with an updated version of build_binary.
Aborted (core dumped)
I’m not sure what has caused this problem (i’m assuming it is not a problem with the deepspeech code), but could someone help me?
My understanding is that from 0.7.0 onward the trie and lm will be replaced by a scorer file, but since i’m working on the head of 0.6.1 I assume this also not is the problem? Maybe i cloned the wrong version of kenlm from github?
ps: running /PATH/native_client/generate_trie without input, gives me the following output:
Usage: ../tensorflow/bazel-bin/native_client/generate_trie <alphabet> <lm_model> <trie_path>