Hi Guys, I am transcribing files having duration of two minutes. I have customized the language model using my own vocabulary. I have few things to ask.
What should be the order of my language model (kenlm)?
My audio files contain words like zero,one,two, three…nine ,twenty,thirty,forty ,fifty etc and a ,b ,c,d,e,f,…z etc.
e.g transcript
Your account number is four w nine seven ten five p eight h zero zero zero f b you have transferred five thousand nine hundred sixty eight dollars and ninety cents your remaining balance is four thousand eighty six dollars and forty cents.
Should I change the alpha and beta parameters ?
I have tried to find the optimal values of alpha and beta but it is a trade of between certain words. Not working as i expected.
What should be length of sentences in my vocab file?
Thanks Alot
((slow to reply) [NOT PROVIDING SUPPORT])
How did you customized it, adding your own content or just changing the content with your own ?
So according to your example transcript, the audio content is not just numbers.
That’s possible.
It’s always a trade-off. May I ask the accent of your speakers ? it’s not impossible this has a play in your issue here.
Thank you for Replying .
Accent is American and it works fine with the pre trained model. I made my own language model (no previous vocabulary). For me only account number and amount matters. But there are one or two mistakes in account number and in amount as well.
I added alphabet characters digits pronounciation(one,two,thirty,forty etc) and amounts like hundred, thousand etc.
Did i make any wrong assumptions for language model?
should i finetune on my data?
If yes I do not have smaller files of 5 to 7 seconds.
At maximum i have 100 audio files
are 3 epochs enough to finetune ?
((slow to reply) [NOT PROVIDING SUPPORT])
I can’t tell for you, you need to explore your dataset … But 100 audio files with 5 to 7 seconds seems like not a lot, I’m not sure you can get anything …