Hello!
I have tried training the pre-trained deepspeech-0.6.1 model on a data set containing 3,400 audio files in ‘train’, 430 files in ‘dev’ and 430 in ‘test’
I have not changed any hyper parameters during the training.
I had a few questions:
(1) So in my case, i set the no. of epochs as 3 and the model started from epoch-0 up to epoch-2.
When the model continues training from the last checkpoint, in this case ‘step 233784’, should the EPOCHS start from ‘0’?
(2) In the ‘training from pretrained model’ command, I have specified the output_dir to store the new model.
The next time I want to continue training on the same model(output), what command should I use? as nowhere do we specify the directory in which the model exists (not in the ‘training from pretrained model’ command either)
(3) What indicators point to over fitting? So, what should the the losses be like during training and validating in each epoch?
(4) After testing this dataset, here are the results:
WER=0.989837, CER=0.604864, loss=12.066513
As you can tell, these are really poor results. How should I start changing the hyper parameters to improve my WER? (I have not made any changes to the batch sizes or the learning rate)
Thanks in advance
Model is the output, you can only continue training on checkpoints, not models.
Bad results:-) Loss should decrease for both test and dev. You have very few material, your results will not be great. Maybe post an example?
Search this forum for finetuning or transformer. I would start with a learning rate lower than 0.0001, maybe e-4, e-5 or lower, search the the forum and you’ll find a lot.
Thank you so much!
I had another question…
So I’ve trained the pretrained model on 2 data sets. The first one(3400 train files) having a WER of 0.71. After training it with a bigger data-set(84,000 train files), the WER is just 0.126(on that dataset), which is awesome.
However, when i tried to transcribe a test.wav file using both, the original pretrained model and the model i trained(on the 2 data-sets), the pretrained model gives much better accuracy.
How do i improve the accuracy of my model??
So, you got 12% WER in the test of your new model, but when you transcribed a arbitrary soundfile, the pretrained performed better?
Well, I think that’s expected, since the 0.6.1 model achieves 6.5% WER.
Another thing I’m sure is involved here is the fact that you may be overfitting that model. Let me explain it:
0.6.1 model was trained to the point where, if you continued training, it will overfit, that is, start reducing as much as possible training loss, but at cost of test loss, which is something we do not desire.
So, did you notice that your training loss was being reduced while your dev loss wasn’t, or was doing it poorly? Like 0.1% per epoch. If that’s the case, then the model you have is overfitted.