Can I use other language modelling tools than KenLM

mark2 · January 11, 2018, 10:59am

Hi!

I want to set up a training pipe for my own audio and text corpus by following the tutorial: TUTORIAL : How I trained a specific french model to control my robot

However, I got stuck in building language model with KenLM (in the tutorial the command: “/bin/bin/./lmplz --text vocabulary.txt --arpa words.arpa --o 3”) as it requires Boost and I had a lot of problems with installing it to Mac.

Is Deep Speech compatible with other language modelling tools such as:
SRILM: https://www.sri.com/engage/products-solutions/sri-language-modeling-toolkit
VariKN: https://github.com/vsiivola/variKN
TheanoLM: https://github.com/senarvi/theanolm

such that I am not bind to KenLM?

lissyx · January 11, 2018, 11:24am

We only support KenLM, we even have a specific CTC decoder bound with it. However, if you switch the code not to rely on the specific CTC/KenLM decoder, you might be able to plug your own. Be prepared to hack, though.

What’s your problem with Boost? Maybe you should start a specific topic, others might be able to help you?

mark2 · January 15, 2018, 7:14am

The problem I met while installing Boost from sources was the error:
error: no matching constructor for initialization of ‘storage_type’ (aka ‘boost::atomics::detail::storage128_type’)

but I solved it by trying version 1.54:

lissyx · January 15, 2018, 8:21am

So now it works for you?

mark2 · January 15, 2018, 8:22am

Yeah, I got KenLM installed thanks!

krishnamohan.191 · March 21, 2018, 12:30pm

Can you explain how CTC decoder bound to KenLM.

lissyx · March 21, 2018, 12:39pm

@reuben implemented specific code to have CTC decoder using the beam scoring from KenLM.

jageshmaharjan · May 7, 2018, 7:40am

Is the Boost library necessary. However, I installed withsudo apt-get install cmake libblkid-dev e2fslibs-dev libboost-all-dev libaudit-dev