Setting language model weight to 0 gives different results for different language models

Sure. I used master and audioTranscript_cmd.py script. I added evaluate function and modified load_model(last parameter is lm_weight):

This code does a lot of other things, can you please stick to trivial ones like native_client/python/client.py ?

Hi @lissyx, I’ve got the same question:

I have just changed in native_clienty/python/client.py

# The alpha hyperparameter of the CTC decoder. Language Model weight
LM_ALPHA = 0
# The beta hyperparameter of the CTC decoder. Word insertion bonus.
LM_BETA = 0

and afterwards used different LMs, like:

python native_client/python/client.py --model training_accurate/export_dir3/output_graph.pb --alphabet deepspeech-0.5.0/model/alphabet.txt --audio $FILE1 --lm lm_trie_vocab/lm.binary --trie lm_trie_vocab/trie --extended
and
python native_client/python/client.py --model training_accurate/export_dir3/output_graph.pb --alphabet deepspeech-0.5.0/model/alphabet.txt --audio $FILE1 --lm lm/lm.binary --trie lm/trie --extended
and I got different outputs. How can that happen? Is there maybe instead another possibility to not using the LM for it? It seems like it’s not that easy to exclude the Scorer as it is in DeepSpeech.by because it is using the binary here?! Thanks already

Oh, I can actually answer my own question for everyone else who has this problem: in native_client/python/client.py it is not neccessary to use a LM and in contrast to DeepSpeech.py there is no default setting ! So easily: do not us the flags and it won’t take an LM into account I guess, because: (in client.py)

if args.lm and args.trie:
    print('Loading language model from files {} {}'.format(args.lm, args.trie), file=sys.stderr)
    lm_load_start = timer()
    ds.enableDecoderWithLM(args.alphabet, args.lm, args.trie, LM_ALPHA, LM_BETA)
    lm_load_end = timer() - lm_load_start
    print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)

It would have been useful to share those.

Yes.

Ok true, it is still weird. Maybe the different tries have something to do with it?

Output 1: (lm_trie_vocab)

Loading language model from files lm_trie_vocab/lm.binary lm_trie_vocab/trie
Loaded language model in 0.004s.
Running inference.
2019-08-15 12:28:26.337842: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
probieren wir es mal mit den aktuellen steg deutschen worten zum einen gibt es die donau darm schiff fahrt gesellschaft kapitaen wir wird welcher de rind fleisch etikett er uns ueber wach uns aufgaben ueber trage uns gesetz platz machen musste was fuer toll woerter
Inference took 4.178s for 14.282s audio file.

Output 2: (lm)

Loading language model from files lm/lm.binary lm/trie
Loaded language model in 0.000161s.
Running inference.
2019-08-15 12:32:49.445661: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
wir wir es mal mit den aktuell laengsten deutschen wir den dem einen gibt es die deutschen es welche den wir wir welche dem den welche die den fuer machen fuer laengsten machen mit es fuer tolle wir to
Inference took 4.409s for 14.282s audio file.

Both are German Language Models, one with lots of text and one containing just the training textfiles.

@ena.1994 I don’t understand, you run inference with two different set of LM+tries, why are your surprised that the final decoding is different ?

I am surprised because I was thinking setting both LM hyperparameters to ZERO would leed to not using the LM even though there is in both cases one mentioned . In both cases those hyperparameters LM_BETA = 0 and LM_ALPHA = 0. Like @Kirill asked before.

Now your statement is confusing. What is #9 about ? You said you found how to not enable the LM ? And then you compare with two different LMs ? And then you talk about LM weights ?

Sorry for confusing! I hope I can make that clear :Ok first I wanted to not use LM with setting the weights to zero, afterwords I found out it’s easier when I’m instead not using the flags and its working fine. So I don’t really have a problem here anymore. BUT the origin question was: why is client.py using an LM even though the weights are both set to zero? That must be the case because otherwise the outputs should be the same as they don’t really use the (different) LMs . Maybe it is bug OR I am missunderstanding the role of the LM_ALPHA and LM_BETA hyperparamters.

Can you explain thoroughly all your steps for testing LM_ALPHA and LM_BETA ? I might have got a clue while taking my shower …

@lissyx,
I have the same thing.
I trained my own non-English model using v.0.5.1. Setting alpha and beta to 0 when running evaluate.py generates different results on different language models on the same test data.

The only way that gives me unified results without language model was to assign scorer=None.

Can you share more context / examples of the variations ?

@lissyx, Context like what, I would like to help?

I am using special unicode characters in my transcript and alphabets. Which are different than the English alphabets.

I have tested with two different language models, each was tested using:

  1. default values: lm_alpha=0.75 and lm_beta=1.85, results were acceptable where each word in the decoded results belongs to the used language model.

  2. then each was tested with lm_alpha=0 and lm_beta=0, decoded words were belongs only to the used language model. Different language models generate different results.

  3. tested by removing ‘scorer’ line from ‘evaluate.py’ code and putting ‘scorer=None’ instead. Different language models generate same results. Decoded words not necessarily belong to any of the used language models. However, this method needs a lot of time.

Then, I tested with a language model that is not related to my work and data. I used the paths of one of the provided English LMs, binary and trie. Assigning 0 to lm_alpha and lm_beta generates unaccepted results.

Could you help me in the best way to disable using of language model in the final decoding results?

I am using v.0.5.1
I tested the above using evaluate.py besides doing single shot inference from DeepSpeech.py

I don’t see the point here.

Please, share examples.

Could you explain the usecase here ? Are you trying to achieve something ? Or just debugging the different results when LM weights are 0 ?

“garbage unaccepted results”, again, it would be nice that you share examples …

Without anything that we can reproduce on our side, it’s going to be complicated to fix.

Setting lm_alpha and lm_beta to 0 is not a suitable way to disable LM scoring. As some have already mentioned here, the clients only enable the LM if you pass the flags. As for the Python code, just pass scorer=None to the decoder calls.

1 Like

run_singleshot_clean_final_v3.sh data/recorded/po6vfbxbnyduz0k9.wav

with --lm_alpha 0.75 --lm_beta 1.85:

ءَلَيهِم وَيَتُووبِ عَلَيهِم وَللَدِيييين

with --lm_alpha 0 --lm_beta 0:

ءَلَيهِم جَيلَ زُوو فِعَلَيهِم وَلَلءِيين

with scorer=None:

ءَلَيهِم جَيمَڧزُوو بِعَلَيهِم وَلَڟڟَآآللِيييين

Those are right-to-left, however they are special characters which are not readable anyway :smiley:

I just need to test my model without any language model, and not restricting to any bag of words. The second option was not satisfying for me regarding the generated results and seems not discarding the LM/trie. The third option was very satisfying, but it is very slow.

Thank you.

Great, you confirmed what I saw in my tests’ results.
I am very satisfied with the results of scorer=None. But, it takes a long time to finish decoding. Instead of 00:30 when using scorer, 12:30 hours were needed using scorer=Noneon my test data. I am using the default --beam_width value.

Any help in this would be much appreciated.
Thank you @reuben

This is just one result. Having different results depending on the values of alpha and beta is expected. So far, it seems you imply that over subsequent runs with 0.0 for both values, you get different decoding. This is what I’m curious about.

The LM and trie have a play in the speed of the decoding, that’s expected. Try reducing the beam width.

1 Like