Hello Team! I am training a German Speech model using DeepSpeech and struggling to get the best set of Hyperparameters. I referred to the documented Hyperparameters in the DeepSpeech releases but the model is not producing great results (WER 30). On increasing the number of epochs to more than 10, the model is overfitting i.e. Training loss less than 5% but Validation loss more than 100%. I am wondering if there is a way to tune the Hyperparameters using Grid Search or any other way in DeepSpeech?
I am using approximately 300 hours of Dataset, with following parameters:
Independent of hyperparameters, 300 hours alone is not sufficient to train a model capable of understanding unrestricted speech. For English we use more than 10 times that amount of speech and that’s still not enough.
That said if 300 hours is all you have, I’d try fine-tuning the 0.4.1 English model with your German data and with the standard character substitutions ä=>ae, ü=>ue, ö=>oe, and ß=>ss to fix the fact that the alphabets differ.
As for the hyperparameters, first try this fine-tuning with the ones you are using.
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
3
@agarwalaashish20 To complement that answer, I can confirm that with a somewhat lower amount of audio, around 235 hours of french, including Common Voice released data, and fine-tuning on top of english with compatible alphabet, even though WER / CER of test set are not really awesome, actual fiel usage under proper condition gives decent enough result (speak slow, articulate, etc).
1 Like
lissyx
((slow to reply) [NOT PROVIDING SUPPORT])
4