As we know the language model vastly influences the quality of inference.
I am considering how to minimise the influence of the quality of language model on the validation results during training. Is it a good idea to create LM based only on texts available in the training data just for the training (validation specifically, because LM is not used during training)? I assume, that it will allow the proper inference results for validation. Afterward I would work on LM for inference using trained model in the actual application. Does this make sense?
When you are doing validation, is because you want to be sure that you are not overfitting. So, if you try to use a LM in the validation step, which is extremely inefficient, will not give you anything new about the state of your fitting. If you calculate your loss after the language model influence, it may be less, sure, but it will be the same trend as if you didn’t use it.
May I ask how was the Mozilla’s 0.6.1 model trained regarding the LM?
Was the LM used at all for the validation? If so, did it contain all the texts from the training set as well or did you use only the provided trie and lm.binary? I haven’t found any information about it in the release notes.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
6
No, LM is only used during test set evaluation.
Everything is provided to rebuild the LM under data/lm