I’ve been running different fine tuning tests on the 0.1.1 release by just training on single voices to see if I could improve the model performance, I train with the default command:
I trained with 1 hour of audio for 1-3 epochs and the model never seems to improve. I’m wondering if I’m doing something wrong or the code is not working. Has anyone else here successfully trained a better performing model using fine tuning on top of deepspeech?
It is trained on 10 second chunks. How do you know it works fine? And what type of model did you train it, was it a new language or new voice? As in how did you measure it? What did you use for learning rate and how many epochs? Would appreciate any information you can share, thank you!
I started from the frozen model of 0.1.1. I trained on new voice dataset with the default hyperparameters. It works, since it works The only trick that comes to my mind is: at first I tried to train on chunks of equal length. I had segmented my large audio file into segments of 3 seconds each. This didn’t work. It seems that somehow it prefers unequal lengths.