Error while compiling generate_trie.cpp

Sorry, i was referring the french model, completely missed this part.
However, if i try , here core dumped is coming

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/build_binary -a 255 -q 8 trie words.arpa lm.binary
Reading words.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
**********************************************************************Segmentation fault (core dumped)

So now itā€™s crashing in KenLM ā€¦ I really cannot help here, you need to figure it out yourself, but you also need to seriously follow the documentation.

Can you try and follow how generate_lm.py operates here?:

../kenlm/build/bin/lmplz --discount_fallback --order 3 --text mirrorfit.txt --arpa words.arpa --prune  0 1
1 Like

I also see you are working as root, thatā€™s unrelated but thatā€™s usually a bad habit.

1 Like

Thank a lot lissyx , you were really helpful. With my foolish mistakes, problems got extended. Now I have working language model. I will be just mentioning it, if anyone wants it and faces same issue:=

  1. download the upper.txt from librispeech.
  2. Add your custom words as a sentence.
  3. Now use generate_lm.py and edit the absolute path of lmpz & build_binary
    (if giving errors run the commands in python console to debug)
  4. with the output binaries generate the trie from native_client.tar
  5. use these binary and trie during training and prediction
1 Like

So everything works for you ? Thatā€™s an excellent news !

Could you please take some time, and review now that you got it working, what might be unclear / need documentation improvement ?

Itā€™d be awesome if you could at least open an issue about what we have to fix, or even send PR for improving docs.

I really struggled with parameter passing during .arpa and .binary generation(though it was very clear with the generate_lm.py script). Since on the kenlm repo and french model tutorial are not using those parameters .

In the end I have manually downloaded the upper.txt and added my words. And ran python code snippet by snippet (also paths in the script file of lmplz and build_binary has to be linked with kenlmā€™s path). It worked finally.