Hi Guys, I am transcribing files having duration of two minutes. I have customized the language model using my own vocabulary. I have few things to ask.
What should be the order of my language model (kenlm)?
My audio files contain words like zero,one,two, three…nine ,twenty,thirty,forty ,fifty etc and a ,b ,c,d,e,f,…z etc.
e.g transcript
Your account number is four w nine seven ten five p eight h zero zero zero f b you have transferred five thousand nine hundred sixty eight dollars and ninety cents your remaining balance is four thousand eighty six dollars and forty cents.
Should I change the alpha and beta parameters ?
I have tried to find the optimal values of alpha and beta but it is a trade of between certain words. Not working as i expected.
What should be length of sentences in my vocab file?
Thanks Alot