Predicting mobile number

Hello @lissyx @reuben i have created language model for my specific problem and trained the model on Common voice english data and my model had 1.3 loss on training set.
The predicting is working quite good but i am confuse how to deal with mobile number on language model? i have written every single number from zero to nine (in words) line by line but i think so single word is not helping the model to predict better so what can i do?
i thought the other method can be writting 10 digit number with 9! arrangement. Can you suggest me what can i do?

Add ’ --discount_fallback’ during the build of the language model is a start.

Also, check the kenLM GitHub for others with the same problem.

Thank you for reply but i have already added discount fallback.
Actually I want to convert speech to text for anyone who is saying their mobile number.
The problem is I am cofused whether writing number like for an example
double
triple
zero
one
two

nine
is right format to do or I should include zero to nine in a single sentence with different combinations?

I would not train on every possible combination of phone numbers. The kenLM approach assumes the corpus will hint at what words generally appear next to each other and then create a probability score.

You can tell which if these are more likely:
“I would like to help”
“I to like would help”
The language model would figure out that a more likely representation of this ngram is “I would like to help” given its chain probability score.
If you’re training on all combinatorial numbers, you’re not really helping the language model learn how to tell what number sequences are correct since every possible sequences had been fed on training. I think.

I’m curious what your experimentation shows. Phone numbers and addresses may benefit from separate models. I remember reading on this discorse about a user trying to transcribe voicemails. Perhaps they have some options as well.