Hello,
I did finetune deepspeech english model with my custom data (5-7 sec audio, 3 hours of audio in general, most of them was an machine generated data thorugh data augmentation).
First I trained for 3 epochs (lr = 0.0001, batch_size =16, without dropout)
The loss function was getting lower till the value of 3, then I tested it and the result was really bad , much worst then before finetuning. I tried to fine tune again for 10 epochs with the same hyperparameter and the loss function lowered till 0.8 , but the prediction was still really really bad… I want to also mention that I did not get any warnings during the finetuning. Overfitting?
I did the same experiments with german data and the transcription improved after finetuning, even though I used a smaller data set. The loss funtion was about 60.
Could someone give me an advise or tell me what could go wrong? should i use dropout= 0.4 ?