I’m trying to create a trie and am encountering the following error:
terminate called after throwing an instance of 'lm::FormatLoadException'
what(): native_client/kenlm/lm/binary_format.cc:131 in void lm::ngram::MatchCheck(lm::ngram::ModelType, unsigned int, const lm::ngram::Parameters&) threw FormatLoadException.
The binary file was built for probing hash tables but the inference code is trying to load trie with quantization and array-compressed pointers
Aborted (core dumped)
I am following the tutorial here which was used to make a French model. I was able to make the binary with the following in KenLM:
/lmplz --text vocabulary.txt --arpa words.arpa --o 3
/build_binary -T -s words.arpa language.binary
I’m attempting to make the trie as follows:
native_client/generate_trie alphabet.txt language.binary vocabulary.txt trie
According to this Github issue this may be caused by the switch in the language model tooling. I changed my command to generate the binary by adding “trie” for [type] as recommended in the issue.
/build_binary -T -s trie words.arpa language.binary
That didn’t seem to make any change. I am using the native client master that was downloaded from the task cluster (it downloaded from here https://index.taskcluster.net/v1/task/project.deepspeech.deepspeech.native_client.master.cpu/artifacts/public/native_client.tar.xz).
I think this is a simple flag issue in creating the binary but I’m not sure what to change.
I should also say I’m using a KenLM build that is separate from my DeepSpeech installation. It seems the KenLM in DeepSpeech can be used for inference but also could be built fully to generate the binaries. I didn’t do this so I’m not sure if that is causing a problem.
Thanks for an amazing project!