I have around 55-60 hrs of Indian Speech (Common Voice[indian] + Indic TTS + youtube(8-9 hrs of data augmented to ~30 hrs)). I’m trying to train deepspeech with the dataset; however, I’m getting very poor output…
Sample Ground Truth: ‘This is lovely’
Sample res:‘l’
I was initially using 0.3.0 checkpoint. I’m also trying with 0.4.1 checkpoint. I’ve had around 10 epcoh (from checkpoint) with train batch 24, dev & test bach 8 each.
The res is always a single alphabet.
Any help would be appreciated. If anyone has worked on Indian Language, probably s/he can kindly give direction to proceed.
Any help would be greatly appreciated. Thank you.