Customizing Model

sibtainraza158 · June 21, 2019, 4:52pm

Hi Guys, I am transcribing files having duration of two minutes. I have customized the language model using my own vocabulary. I have few things to ask.
What should be the order of my language model (kenlm)?
My audio files contain words like zero,one,two, three…nine ,twenty,thirty,forty ,fifty etc and a ,b ,c,d,e,f,…z etc.
e.g transcript

Your account number is four w nine seven ten five p eight h zero zero zero f b you have transferred five thousand nine hundred sixty eight dollars and ninety cents your remaining balance is four thousand eighty six dollars and forty cents.

Should I change the alpha and beta parameters ?
I have tried to find the optimal values of alpha and beta but it is a trade of between certain words. Not working as i expected.
What should be length of sentences in my vocab file?
Thanks Alot

lissyx · June 24, 2019, 2:27pm

How did you customized it, adding your own content or just changing the content with your own ?

So according to your example transcript, the audio content is not just numbers.

That’s possible.

It’s always a trade-off. May I ask the accent of your speakers ? it’s not impossible this has a play in your issue here.

sibtainraza158 · June 24, 2019, 3:19pm

Thank you for Replying .
Accent is American and it works fine with the pre trained model. I made my own language model (no previous vocabulary). For me only account number and amount matters. But there are one or two mistakes in account number and in amount as well.
I added alphabet characters digits pronounciation(one,two,thirty,forty etc) and amounts like hundred, thousand etc.
Did i make any wrong assumptions for language model?

sibtainraza158 · June 24, 2019, 3:20pm

and yes account number contains letters as well.

lissyx · June 24, 2019, 3:38pm

No, but it’s possible the current dataset does not contain enough numbers, and thus accoustic recognition is not perfect enough to help the LM ?

sibtainraza158 · June 24, 2019, 3:42pm

should i finetune on my data?
If yes I do not have smaller files of 5 to 7 seconds.
At maximum i have 100 audio files
are 3 epochs enough to finetune ?

lissyx · June 25, 2019, 6:21am

I can’t tell for you, you need to explore your dataset … But 100 audio files with 5 to 7 seconds seems like not a lot, I’m not sure you can get anything …

Topic		Replies	Views
English Deep Speech with German accent in DeepSpeech	15	982	January 23, 2020
DeepSpeech full explaination DeepSpeech	3	3231	July 19, 2019
Customizing language model DeepSpeech	13	8597	February 27, 2018
Language Model Tuning DeepSpeech	7	1062	March 26, 2019
Where is Vocab.txt file? DeepSpeech	10	2468	April 5, 2019

Customizing Model

Related topics