Model performing bad when we doing fine tuning

Hello,

I did the fine-tuning of the deepspeech model using below command

python3 DeepSpeech.py --n_hidden 2048 --checkpoint_dir ~/workspace/chib/model/DeepSpeech/checkPointsDir/ --epochs 1 --train_files ~/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/train.csv --dev_files ~/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/dev.csv --test_files ~/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/testNew.csv --learning_rate 0.0001

Test file is the same as the testfile of common voice data shared on github. But I am bit strange by the results.

Testing model on /home/aml/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/testNew.csv
Test epoch | Steps: 43 | Elapsed Time: 0:00:21
Test on /home/aml/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/testNew.csv - WER: 0.873239, CER: 0.637005, loss: 90.517876

WER: 1.000000, CER: 0.642857, loss: 26.292532

  • src: “do you want me”
  • res: “no man no”

WER: 1.000000, CER: 0.812500, loss: 27.239510

  • src: “what was he like”
  • res: “maimie”

WER: 1.000000, CER: 0.642857, loss: 38.179359

  • src: “do you mean it”
  • res: “more rent”

WER: 1.000000, CER: 0.666667, loss: 40.403831

  • src: “you wanna take this outside”
  • res: “i want missy”

WER: 1.000000, CER: 0.689655, loss: 46.982464

  • src: “that would be funny if he did”
  • res: “at obfuscated”

WER: 1.000000, CER: 0.818182, loss: 54.964172

  • src: “what do you advise sir”
  • res: “alabaster”

WER: 1.000000, CER: 0.863636, loss: 58.110783

  • src: “i’m so glad to see you”
  • res: “in preparing”

WER: 1.000000, CER: 0.842105, loss: 61.973919

  • src: “she’ll be all right”
  • res: “every”

WER: 1.000000, CER: 0.800000, loss: 72.811066

  • src: “you yellow giant thing of the frost”
  • res: “a alcantro”

WER: 1.000000, CER: 0.535714, loss: 74.692726

  • src: “groves started writing songs when she was four years old”
  • res: “go i started it in song washpool”

Can any one tell me why the results seems bad ?

Can you explain more precisely the fine-tuning you are doing ?

Also, please use proper code formatting in your message for console output, you are making it very hard to read without.

This is the command I am running

python3 DeepSpeech.py  --n_hidden 2048
--checkpoint_dir  ~/workspace/chib/model/DeepSpeech/checkPointsDir/ 
--epochs 1
--train_files ~/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/train.csv 
--dev_files ~/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/dev.csv
--test_files ~/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/testNew.csv
--learning_rate 0.0001 
--export_dir ~/workspace/chib/model/DeepSpeech/newModels/

The above command I am running to fine-tune my model from the checkpoint present here

The dataset I am using for training/testing/validation is the same as common voice data.

Default Hyperparameter for fine-tuning are

  • train_batch_size 24
  • dev_batch_size 48
  • test_batch_size 48
  • n_hidden 2048
  • learning_rate 0.0001
  • dropout_rate 0.15
  • epoch 75
  • lm_alpha 0.75
  • lm_beta 1.85

The output I am getting is

Test epoch | Steps: 43 | Elapsed Time: 0:00:21                                                                                                                                                 
Test on /home/aml/workspace/chib/data/DeepSpeech/commonVoiceData/data/clips/testNew.csv
- WER: 0.873239, CER: 0.637005, loss: 90.517876 

Some sample test outputs

WER: 1.000000, CER: 0.642857, loss: 26.292532
 - src: "do you want me"
 - res: "no man no"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.812500, loss: 27.239510
 - src: "what was he like"
 - res: "maimie"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.642857, loss: 38.179359
 - src: "do you mean it"
 - res: "more rent"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.666667, loss: 40.403831
 - src: "you wanna take this outside"
 - res: "i want missy"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.689655, loss: 46.982464
 - src: "that would be funny if he did"
 - res: "at obfuscated"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.818182, loss: 54.964172
 - src: "what do you advise sir"
 - res: "alabaster"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.863636, loss: 58.110783
 - src: "i'm so glad to see you"
 - res: "in preparing"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.842105, loss: 61.973919
 - src: "she'll be all right"
 - res: "every"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.800000, loss: 72.811066
 - src: "you yellow giant thing of the frost"
 - res: "a alcantro"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 0.535714, loss: 74.692726
 - src: "groves started writing songs when she was four years old"
 - res: "go i started it in song washpool"
--------------------------------------------------------------------------------

I hope this will be fine. Now, Can you tell me why model perform bad in fine-tuning ?

Have you tried : more epochs, and lower learning rate ?

Try with a much lower learning rate. start with around 10^-6.

what are you finetuning on though?(Dataset wise) Are you just retraining the pretrained model on commonvoice eng?

Initially I fine-tuned with my data, but model shows very bad result, then I thought let’s fine tune with the same data with which the model is trained, to track the model behavior.
Yes, now I am trying two model with 5 and 30 epochs to see the difference

whatever you do, make sure the LR is low(order of 10^-6). Pre-trained models have very low loss and larger steps break the model.

Yes, I am trying with more epochs and lower learning rate. Also, I read from here. They mention the use of the negative epoch --epcoh -5. I am following that also.

There’s a typo and negative values are not supported anymore.

Ok, Thanks now I will train the model with 10^-6 learning rate and 10 epochs