Setting language model weight to 0 gives different results for different language models

Kirill · October 10, 2018, 2:18pm

Sure. I used master and audioTranscript_cmd.py script. I added evaluate function and modified load_model(last parameter is lm_weight):

gist.github.com

https://gist.github.com/Zhdanovich/87fd4410c95a7b932209c9033be4a719

gistfile1.txt

def evaluate():
    valid_model_path = "../../../data_deepspeech/models"
    empty_model_path = "../../../data_deepspeech/models__"
    dirName = os.path.expanduser(valid_model_path)
    empty_model_dirName = os.path.expanduser(empty_model_path)

    output_graph, alphabet, lm, trie = wavTranscriber.resolve_models(dirName)
    _, _, empty_lm, empty_trie = wavTranscriber.resolve_models(empty_model_dirName)

    # Load output_graph, alpahbet, lm and trie

This file has been truncated. show original

lissyx · October 10, 2018, 2:21pm

This code does a lot of other things, can you please stick to trivial ones like native_client/python/client.py ?

ena.1994 · August 15, 2019, 12:14pm

Hi @lissyx, I’ve got the same question:

I have just changed in native_clienty/python/client.py

# The alpha hyperparameter of the CTC decoder. Language Model weight
LM_ALPHA = 0
# The beta hyperparameter of the CTC decoder. Word insertion bonus.
LM_BETA = 0

and afterwards used different LMs, like:

python native_client/python/client.py --model training_accurate/export_dir3/output_graph.pb --alphabet deepspeech-0.5.0/model/alphabet.txt --audio $FILE1 --lm lm_trie_vocab/lm.binary --trie lm_trie_vocab/trie --extended
and
python native_client/python/client.py --model training_accurate/export_dir3/output_graph.pb --alphabet deepspeech-0.5.0/model/alphabet.txt --audio $FILE1 --lm lm/lm.binary --trie lm/trie --extended
and I got different outputs. How can that happen? Is there maybe instead another possibility to not using the LM for it? It seems like it’s not that easy to exclude the Scorer as it is in DeepSpeech.by because it is using the binary here?! Thanks already

ena.1994 · August 15, 2019, 12:20pm

Oh, I can actually answer my own question for everyone else who has this problem: in native_client/python/client.py it is not neccessary to use a LM and in contrast to DeepSpeech.py there is no default setting ! So easily: do not us the flags and it won’t take an LM into account I guess, because: (in client.py)

if args.lm and args.trie:
    print('Loading language model from files {} {}'.format(args.lm, args.trie), file=sys.stderr)
    lm_load_start = timer()
    ds.enableDecoderWithLM(args.alphabet, args.lm, args.trie, LM_ALPHA, LM_BETA)
    lm_load_end = timer() - lm_load_start
    print('Loaded language model in {:.3}s.'.format(lm_load_end), file=sys.stderr)

lissyx · August 15, 2019, 12:20pm

It would have been useful to share those.

Yes.

ena.1994 · August 15, 2019, 12:55pm

Ok true, it is still weird. Maybe the different tries have something to do with it?

Output 1: (lm_trie_vocab)

Loading language model from files lm_trie_vocab/lm.binary lm_trie_vocab/trie
Loaded language model in 0.004s.
Running inference.
2019-08-15 12:28:26.337842: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
probieren wir es mal mit den aktuellen steg deutschen worten zum einen gibt es die donau darm schiff fahrt gesellschaft kapitaen wir wird welcher de rind fleisch etikett er uns ueber wach uns aufgaben ueber trage uns gesetz platz machen musste was fuer toll woerter
Inference took 4.178s for 14.282s audio file.

Output 2: (lm)

Loading language model from files lm/lm.binary lm/trie
Loaded language model in 0.000161s.
Running inference.
2019-08-15 12:32:49.445661: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
wir wir es mal mit den aktuell laengsten deutschen wir den dem einen gibt es die deutschen es welche den wir wir welche dem den welche die den fuer machen fuer laengsten machen mit es fuer tolle wir to
Inference took 4.409s for 14.282s audio file.

Both are German Language Models, one with lots of text and one containing just the training textfiles.

lissyx · August 15, 2019, 1:05pm

@ena.1994 I don’t understand, you run inference with two different set of LM+tries, why are your surprised that the final decoding is different ?

ena.1994 · August 15, 2019, 1:18pm

I am surprised because I was thinking setting both LM hyperparameters to ZERO would leed to not using the LM even though there is in both cases one mentioned . In both cases those hyperparameters LM_BETA = 0 and LM_ALPHA = 0. Like @Kirill asked before.

lissyx · August 15, 2019, 2:12pm

Now your statement is confusing. What is #9 about ? You said you found how to not enable the LM ? And then you compare with two different LMs ? And then you talk about LM weights ?

ena.1994 · August 15, 2019, 2:18pm

Sorry for confusing! I hope I can make that clear :Ok first I wanted to not use LM with setting the weights to zero, afterwords I found out it’s easier when I’m instead not using the flags and its working fine. So I don’t really have a problem here anymore. BUT the origin question was: why is client.py using an LM even though the weights are both set to zero? That must be the case because otherwise the outputs should be the same as they don’t really use the (different) LMs . Maybe it is bug OR I am missunderstanding the role of the LM_ALPHA and LM_BETA hyperparamters.

lissyx · August 15, 2019, 6:05pm

Can you explain thoroughly all your steps for testing LM_ALPHA and LM_BETA ? I might have got a clue while taking my shower …

SamahZaro · September 26, 2019, 1:37pm

@lissyx,
I have the same thing.
I trained my own non-English model using v.0.5.1. Setting alpha and beta to 0 when running evaluate.py generates different results on different language models on the same test data.

The only way that gives me unified results without language model was to assign scorer=None.

lissyx · September 26, 2019, 1:39pm

Can you share more context / examples of the variations ?

SamahZaro · September 29, 2019, 5:44pm

@lissyx, Context like what, I would like to help?

I am using special unicode characters in my transcript and alphabets. Which are different than the English alphabets.

I have tested with two different language models, each was tested using:

default values: lm_alpha=0.75 and lm_beta=1.85, results were acceptable where each word in the decoded results belongs to the used language model.
then each was tested with lm_alpha=0 and lm_beta=0, decoded words were belongs only to the used language model. Different language models generate different results.
tested by removing ‘scorer’ line from ‘evaluate.py’ code and putting ‘scorer=None’ instead. Different language models generate same results. Decoded words not necessarily belong to any of the used language models. However, this method needs a lot of time.

Then, I tested with a language model that is not related to my work and data. I used the paths of one of the provided English LMs, binary and trie. Assigning 0 to lm_alpha and lm_beta generates unaccepted results.

SamahZaro · September 28, 2019, 10:53am

Could you help me in the best way to disable using of language model in the final decoding results?

I am using v.0.5.1
I tested the above using evaluate.py besides doing single shot inference from DeepSpeech.py

lissyx · September 29, 2019, 10:03am

I don’t see the point here.

Please, share examples.

Could you explain the usecase here ? Are you trying to achieve something ? Or just debugging the different results when LM weights are 0 ?

“garbage unaccepted results”, again, it would be nice that you share examples …

Without anything that we can reproduce on our side, it’s going to be complicated to fix.

reuben · September 29, 2019, 12:35pm

Setting lm_alpha and lm_beta to 0 is not a suitable way to disable LM scoring. As some have already mentioned here, the clients only enable the LM if you pass the flags. As for the Python code, just pass scorer=None to the decoder calls.

SamahZaro · September 29, 2019, 5:53pm

run_singleshot_clean_final_v3.sh data/recorded/po6vfbxbnyduz0k9.wav

with --lm_alpha 0.75 --lm_beta 1.85:

ءَلَيهِم وَيَتُووبِ عَلَيهِم وَللَدِيييين

with --lm_alpha 0 --lm_beta 0:

ءَلَيهِم جَيلَ زُوو فِعَلَيهِم وَلَلءِيين

with scorer=None:

ءَلَيهِم جَيمَڧزُوو بِعَلَيهِم وَلَڟڟَآآللِيييين

Those are right-to-left, however they are special characters which are not readable anyway

I just need to test my model without any language model, and not restricting to any bag of words. The second option was not satisfying for me regarding the generated results and seems not discarding the LM/trie. The third option was very satisfying, but it is very slow.

Thank you.

SamahZaro · September 29, 2019, 6:01pm

Great, you confirmed what I saw in my tests’ results.
I am very satisfied with the results of scorer=None. But, it takes a long time to finish decoding. Instead of 00:30 when using scorer, 12:30 hours were needed using scorer=Noneon my test data. I am using the default --beam_width value.

Any help in this would be much appreciated.
Thank you @reuben

lissyx · September 29, 2019, 6:46pm

This is just one result. Having different results depending on the values of alpha and beta is expected. So far, it seems you imply that over subsequent runs with 0.0 for both values, you get different decoding. This is what I’m curious about.

The LM and trie have a play in the speed of the decoding, that’s expected. Try reducing the beam width.