Error while compiling generate_trie.cpp

Nitesh_Tiwari · February 6, 2020, 11:26am

Sorry, i was referring the french model, completely missed this part.
However, if i try , here core dumped is coming

(mark3) root@computer:/home/computer/Desktop/mark3/mfit-models# ../kenlm/build/bin/build_binary -a 255 -q 8 trie words.arpa lm.binary
Reading words.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
**********************************************************************Segmentation fault (core dumped)

lissyx · February 6, 2020, 12:10pm

So now it’s crashing in KenLM … I really cannot help here, you need to figure it out yourself, but you also need to seriously follow the documentation.

lissyx · February 6, 2020, 12:39pm

Can you try and follow how generate_lm.py operates here?:

../kenlm/build/bin/lmplz --discount_fallback --order 3 --text mirrorfit.txt --arpa words.arpa --prune  0 1

lissyx · February 6, 2020, 12:39pm

I also see you are working as root, that’s unrelated but that’s usually a bad habit.

Nitesh_Tiwari · February 7, 2020, 9:23am

Thank a lot lissyx , you were really helpful. With my foolish mistakes, problems got extended. Now I have working language model. I will be just mentioning it, if anyone wants it and faces same issue:=

download the upper.txt from librispeech.
Add your custom words as a sentence.
Now use generate_lm.py and edit the absolute path of lmpz & build_binary
(if giving errors run the commands in python console to debug)
with the output binaries generate the trie from native_client.tar
use these binary and trie during training and prediction

lissyx · February 7, 2020, 9:26am

So everything works for you ? That’s an excellent news !

Could you please take some time, and review now that you got it working, what might be unclear / need documentation improvement ?

It’d be awesome if you could at least open an issue about what we have to fix, or even send PR for improving docs.

Nitesh_Tiwari · February 7, 2020, 10:43am

I really struggled with parameter passing during .arpa and .binary generation(though it was very clear with the generate_lm.py script). Since on the kenlm repo and french model tutorial are not using those parameters .

In the end I have manually downloaded the upper.txt and added my words. And ran python code snippet by snippet (also paths in the script file of lmplz and build_binary has to be linked with kenlm’s path). It worked finally.