Language Model Tuning

tuttlebr · March 26, 2019, 3:35am

Hello,

Can anyone recommend sources of information for tuning the language model? Specifically, there is the order of the language model (-o) that is set to 5grams in Ken H’s website but I can’t find much on when it’s recommended to change this.

If I’m transcribing shorter utterances, should I set this order value to a lower number?

If I’m fine tuning to a new domain specific audio set, should I keep the original LM provided by Mozilla or create my own?

I’m not expecting anyone to solve my problem but besides lecture PowerPoints on chain probabilities and language models, I can’t find many resources for this last step. Thanks.

pete · March 26, 2019, 6:36am

Hello!

About your questions:

“If I’m fine tuning to a new domain specific audio set, should I keep the original LM provided by Mozilla or create my own?”

A: If you can, do your own where domain specific words occur … then you get best results. Of course “general sentences” are also needed.

Lenght of NGrams … I recommend you test them in your case… shorter sentences can do well with shorter ngrams but for example in my case 2-Grams were too short to capture context in my case… (My case: Speech to text, phone conversation …)

Hope this helps!

tuttlebr · March 26, 2019, 11:30am

Thank you! I’ll run some experiments with that in mind.

sayantangangs.91 · March 26, 2019, 2:34pm

@tuttlebr there is also an option of trying something like this (called interpolation):

Final LM = W1*Mozilla LM+W2*Customized Domain Specific LM
(Note: W1+W2 = 1)

I am trying to work on something suitable to this. SRILM supports this pretty nicely. The thing is how to tune W1 & W2… So the pipeline is CTC output, weights, LM and then fine tune the weights. Has anyone worked on the same?

Any updates would be really helpful.

tuttlebr · March 26, 2019, 7:22pm

Thank you, @sayantangangs.911 ! I have gained a slight improvement in WER by using a domain specific language model where the -o value was the average word count per sample in my training data utterances.
WER 19.75% to 19.1%
1:1 Match 69% to 71.3%

I am beginning to gradually increase the lm_alpha parameter as well and have seen some improvement on a holdout of examples which previously had a very high WER.

WER 19.1% to 18.28
1:1 Match 71.3% to 72.1

can anyone elaborate on how they have modified the lm_alpha in their own models?

sayantangangs.91 · March 26, 2019, 7:27pm

Hey, @tuttlebr, where are you changing the lm_alpha… I mean, isn’t it meant to be changed only during training and once the model is ready, how and where should lm_alpha be changed? Are you directly changing the values in the client.py file??

tuttlebr · March 26, 2019, 7:35pm

I don’t believe the language model is used in training, only inference. I wrote my own script, inspired by the examples on github, which allow you to modify this parameter. You can also modify the beam search, and lm_beta (word isnertion) parameter. You can also just change this via the util.flags.py file via command --lm_alpha, --lm_beta.

github.com

mozilla/DeepSpeech/blob/master/examples/vad_transcriber/wavTranscriber.py#L18


'''
Load the pre-trained model into the memory
@param models: Output Grapgh Protocol Buffer file
@param alphabet: Alphabet.txt file
@param lm: Language model file
@param trie: Trie file


@Retval
Returns a list [DeepSpeech Object, Model Load Time, LM Load Time]
'''
def load_model(models, alphabet, lm, trie):
    N_FEATURES = 26
    N_CONTEXT = 9
    BEAM_WIDTH = 500
    LM_ALPHA = 0.75
    LM_BETA = 1.85


    model_load_start = timer()
    ds = Model(models, N_FEATURES, N_CONTEXT, alphabet, BEAM_WIDTH)
    model_load_end = timer() - model_load_start
    logging.debug("Loaded model in %0.3fs." % (model_load_end))

sayantangangs.91 · March 26, 2019, 7:41pm

I understand. Since I’m currently using just to test out stuffs, I’m using Ds’ cmd line and to effect a change, I’m having to change exactly this parameter.

Regarding the affect of LM in reaining, while its indeed not used, I felt the lm_alpha and the word insertion parameter would affect it to some extent. But I understand its more in the decoding phase.

Important point though is, could someone help with some production level guidance on LM.

Thanks a lot @tuttlebr

Topic		Replies	Views
Customizing language model DeepSpeech	13	8597	February 27, 2018
Customizing Model DeepSpeech	6	578	June 25, 2019
Fine Tuning Language Model DeepSpeech	6	1411	November 9, 2018
Language Model Interpolation DeepSpeech	0	464	March 25, 2019
DeepSpeech full explaination DeepSpeech	3	3230	July 19, 2019

Language Model Tuning

Related topics