I have trained Deep speech for Hinglish(Hindi-English mixed) dataset of approx 2000 hrs.
parameters: LR: 0.0001, dropout: default (result not improved on 0.08), n_hidden: 2048, LM_alpha:1.8, LM_Beta:2.5 (try to tweak and got good result on this lm_alpha and lm_beta).
result: training loss as 48.66, valid loss: 54.87, Test WER: 0.24, CER: 13.90 and test loss:54.95.
I am not getting a good prediction (one shot infer from the checkpoint) transcript, as a word is predicted correctly in one sentence and misspelled in successive sentences. Sometimes it predicts a word correctly and sometimes it skips in the same context.
example:
“pre-filter” become prefer or fiction, the “membrane” becomes mam, etc.
I have the following questions:
- Is Acoustic model is working good with above losses?
- Is there some problem with decoding if it is, how can I improve it?
- Do I need to look into the language model?
I am getting a very bad result from the same trained model, I try to rebuild tensorflow and changes beam-width in client.py to 1024, results improved but skips lot of words and adds some new words, that are not spoken. please help.
Any help appreciated.
Thanks