I have trained Deep speech for Hinglish(Hindi-English mixed) dataset of approx 2000 hrs.
parameters: LR: 0.0001, dropout: default (result not improved on 0.08), n_hidden: 2048, LM_alpha:1.8, LM_Beta:2.5 (try to tweak and got good result on this lm_alpha and lm_beta).
result: training loss as 48.66, valid loss: 54.87, Test WER: 0.24, CER: 13.90 and test loss:54.95.
I am not getting a good prediction (one shot infer from the checkpoint) transcript, as a word is predicted correctly in one sentence and misspelled in successive sentences. Sometimes it predicts a word correctly and sometimes it skips in the same context.
“pre-filter” become prefer or fiction, the “membrane” becomes mam, etc.
I have the following questions:
Is Acoustic model is working good with above losses?
Is there some problem with decoding if it is, how can I improve it?
Do I need to look into the language model?
I am getting a very bad result from the same trained model, I try to rebuild tensorflow and changes beam-width in client.py to 1024, results improved but skips lot of words and adds some new words, that are not spoken. please help.
I am using 2000 hrs of hindi-english mixed speech conversation (mostly hindi), my labels are written in roman transcript for hindi.
I am not sure about decoding, i am just asking about it based on results that i mentioned above.
I made my vocab from train speech text and then used kenlm to build lm binary and trie.
The main problem that i am facing is some of the english words (especially Nouns) are predicted correctly at one place and misspelled at other places in same audio.
((slow to reply) [NOT PROVIDING SUPPORT])
Could it just be noise / accents / way of speaking that explains ? 2000 hours is not that much, even though it’s already a good level.
Also, nouns, is it likely they are made of less frequent sounds regarding the rest of the dataset ?
Correct me if i am wrong, Would it help me if i would train lm with these keywords on top of current language model or if i would add some language information related to noun into lm. please suggest would it possible.
One last Question that i asked earlier:
I am getting very bad result when i predict from model, some sentences are skipped and some new words are added that i have not given in vocab. In order to solve this i make tensorflow build and make BEAM_WIDTH=1024 , as suggested in Getting better prediction accuracy during inshot-inference from checkpoint but less accuracy on trained model?.
My results are improved but still some trans missing and sentence not seems grammatical correct.