Can I use other language modelling tools than KenLM


(Matti Meikäläinen) #1

Hi!

I want to set up a training pipe for my own audio and text corpus by following the tutorial: TUTORIAL : How I trained a specific french model to control my robot

However, I got stuck in building language model with KenLM (in the tutorial the command: “/bin/bin/./lmplz --text vocabulary.txt --arpa words.arpa --o 3”) as it requires Boost and I had a lot of problems with installing it to Mac.

Is Deep Speech compatible with other language modelling tools such as:
SRILM: https://www.sri.com/engage/products-solutions/sri-language-modeling-toolkit
VariKN: https://github.com/vsiivola/variKN
TheanoLM: https://github.com/senarvi/theanolm

such that I am not bind to KenLM?


(Lissyx) #2

We only support KenLM, we even have a specific CTC decoder bound with it. However, if you switch the code not to rely on the specific CTC/KenLM decoder, you might be able to plug your own. Be prepared to hack, though.

What’s your problem with Boost? Maybe you should start a specific topic, others might be able to help you?


(Matti Meikäläinen) #3

The problem I met while installing Boost from sources was the error:
error: no matching constructor for initialization of ‘storage_type’ (aka ‘boost::atomics::detail::storage128_type’)

but I solved it by trying version 1.54:


(Lissyx) #4

So now it works for you? :slight_smile:


(Matti Meikäläinen) #5

Yeah, I got KenLM installed :slight_smile: thanks!


(Krishna mohan) #6

Can you explain how CTC decoder bound to KenLM.


(Lissyx) #7

@reuben implemented specific code to have CTC decoder using the beam scoring from KenLM.


(Jageshmaharjan) #8

Is the Boost library necessary. However, I installed withsudo apt-get install cmake libblkid-dev e2fslibs-dev libboost-all-dev libaudit-dev