What is the significance of lm alpha and lm beta?

I am trying to limit out of vocabulary words for my custom LM but for that I needed to understand exactly what do lm alpha and lm beta stand for.

They control the way the language model will act. util/flags.py gives more context, you can read that in --helpfull. Please give us feedback if it is still unclear.

This may be due to the fact I am out of practice in C and C++, but can someone please explain how the alpha and beta values cause any effect in the decoder?

I traced it to here: https://github.com/mozilla/DeepSpeech/blob/1eaec6eb5e92323b5d97bdfa6e41502179bfe8a1/native_client/ctcdecode/scorer.h#L99

They then get set here:

But they never seem to be actually used? If so, where?
Does path_trie.cpp use it somehow?

Thank you!

Here?

native_client/ctcdecode/ctc_beam_search_decoder.cpp:        score = ext_scorer_->get_log_cond_prob(ngram, bos) * ext_scorer_->alpha;
native_client/ctcdecode/ctc_beam_search_decoder.cpp:      approx_ctc -= (ext_scorer_->get_sent_log_prob(words)) * ext_scorer_->alpha;
1 Like

Well shoot. Must have missed that when searching for the keyword alpha in the repo. Thank you!