Transfer learning to Urdu with less amount of data - better approach?

Hey there, I am building a subtitle-generation system for Urdu videos using DeepSpeech 0.9.3. For that, I am using transfer learning for the purpose. The steps I followed for transfer learning to Urdu are:

  1. Cloned the code from --branch v0.9.3 of the repo

  2. Downloaded checkpoint from 0.9.3 release page

  3. Built a custom scorer for my own model according to the tutorial in the playbook

  4. dropped source layers and trained model on my data. Following is my training command:

    !python3 DeepSpeech.py --train_cudnn True --early_stop False
    –es_epochs 10
    –n_hidden 2048 --epochs 25
    –learning_rate 0.0001 --dropout_rate 0.08
    –alphabet_config_path /content/drive/MyDrive/alphabet-urdu.txt
    –save_checkpoint_dir /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/
    –load_checkpoint_dir /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/
    –export_dir /content/drive/MyDrive/models/ --export_file_name ‘ft_model’
    –train_files /content/train.csv
    –dev_files /content/dev.csv
    –test_files /content/test.csv
    –scorer_path /content/drive/MyDrive/urdu_scorer/kenlm-urdu.scorer
    –augment reverb[p=0.2,delay=50.0~30.0,decay=10.0:2.0~1.0]
    –augment volume[p=0.2,dbfs=-10:-40]
    –augment pitch[p=0.2,pitch=1~0.2]
    –augment tempo[p=0.2,factor=1~0.5]

I have about 45 hours of training data. After about 30 epochs, it gives empty response but I understand from posts on discourse that I need to increase the number of epochs.

Right now, I am training the model with default --n_hidden 2048 as the available checkpoints don’t work with any other geometry and I need to transfer learn.

But I wanted to ask will training with a large number of epochs make my model eventually work even with such less data or should I go for a smaller model and train from scratch instead of transfer learning? What would be the right approach, kindly guide, any help would be appreciated! Thanks.

Suggestions:

  • Turn of early stopping
  • Train for 100 epochs
  • Increase dropout rate, 0.2 is probably fine
  • You can increase the learning rate to 0.001
  • You’re missing the flag for --drop_source_layers I suggest you use 2.
  • For SpecAugment try: --augment frequency_mask[p=0.8,n=2:4,size=2:4] --augment time_mask[p=0.8,n=2:4,size=10:50,domain=spectrogram]

You can check out my results for different amounts of data using transfer learning here.

The best result I got was for Basque with 10 hours of data was a CER of 6% and a WER of 20%.

I’d also recommend checking your alphabet and making sure that you aren’t trying to predict things for which you have little data or which have little connection to the audio. For Urdu if you’re using found data, you also will want to check the encodings etc. to avoid presentation forms.

1 Like

Thank you so much for this. I’ll try these suggestions and check out your results too.

And I had added --drop_source_layers 2 but since I was training again using my new checkpoints right now and copied that code that’s why it’s not in the given code, apologies for that.

1 Like

Hey @ftyers, I followed your suggestions and was able to reduce my loss considerably. thanks for that.

Although, after about 80 epochs, its showing abnormal behaviour (or so I think) and the validation loss isn’t decreasing below 107.

Since, Colab runtimes stop after 24 hours even in pro versions, I trained till 24 or so epochs and had to train from saved checkpoint and I removed “–drop_source_layer” since I was training from my new checkpoint.

Right now, my run looks like this:

/content/DeepSpeech
I0505 22:14:59.644571 139647931729792 utils.py:157] NumExpr defaulting to 4 threads.
I Loading best validating checkpoint from /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2206411
I Loading variable from checkpoint: beta1_power
I Loading variable from checkpoint: beta2_power
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam
I Loading variable from checkpoint: cudnn_lstm/opaque_kernel/Adam_1
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/bias/Adam
I Loading variable from checkpoint: layer_1/bias/Adam_1
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_1/weights/Adam
I Loading variable from checkpoint: layer_1/weights/Adam_1
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/bias/Adam
I Loading variable from checkpoint: layer_2/bias/Adam_1
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_2/weights/Adam
I Loading variable from checkpoint: layer_2/weights/Adam_1
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/bias/Adam
I Loading variable from checkpoint: layer_3/bias/Adam_1
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_3/weights/Adam
I Loading variable from checkpoint: layer_3/weights/Adam_1
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/bias/Adam
I Loading variable from checkpoint: layer_5/bias/Adam_1
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_5/weights/Adam
I Loading variable from checkpoint: layer_5/weights/Adam_1
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/bias/Adam
I Loading variable from checkpoint: layer_6/bias/Adam_1
I Loading variable from checkpoint: layer_6/weights
I Loading variable from checkpoint: layer_6/weights/Adam
I Loading variable from checkpoint: layer_6/weights/Adam_1
I Loading variable from checkpoint: learning_rate
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 1:52:18 | Steps: 9736 | Loss: 65.438248    
Epoch 0 | Validation | Elapsed Time: 0:29:19 | Steps: 2087 | Loss: 110.117400 | Dataset: /content/dev.csv
I Saved new best validating model with loss 110.117400 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2216147
--------------------------------------------------------------------------------
Epoch 1 |   Training | Elapsed Time: 0:37:36 | Steps: 9736 | Loss: 70.003445    
Epoch 1 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.734356 | Dataset: /content/dev.csv
I Saved new best validating model with loss 109.734356 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2225883
--------------------------------------------------------------------------------
Epoch 2 |   Training | Elapsed Time: 0:37:45 | Steps: 9736 | Loss: 69.737636    
Epoch 2 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.221892 | Dataset: /content/dev.csv
I Saved new best validating model with loss 109.221892 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2235619
--------------------------------------------------------------------------------
Epoch 3 |   Training | Elapsed Time: 0:37:41 | Steps: 9736 | Loss: 69.273269    
Epoch 3 | Validation | Elapsed Time: 0:03:50 | Steps: 2087 | Loss: 108.556368 | Dataset: /content/dev.csv
I Saved new best validating model with loss 108.556368 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2245355
--------------------------------------------------------------------------------
Epoch 4 |   Training | Elapsed Time: 0:37:42 | Steps: 9736 | Loss: 68.330056    
Epoch 4 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.450104 | Dataset: /content/dev.csv
I Saved new best validating model with loss 108.450104 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2255091
--------------------------------------------------------------------------------
Epoch 5 |   Training | Elapsed Time: 0:37:49 | Steps: 9736 | Loss: 68.186690    
Epoch 5 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 107.932049 | Dataset: /content/dev.csv
I Saved new best validating model with loss 107.932049 to: /content/drive/MyDrive/model_checkpoints/deepspeech-0.9.3-checkpoint/best_dev-2264827
--------------------------------------------------------------------------------
Epoch 6 |   Training | Elapsed Time: 0:37:38 | Steps: 9736 | Loss: 67.422159    
Epoch 6 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.247110 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 7 |   Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 67.124132    
Epoch 7 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.619550 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 8 |   Training | Elapsed Time: 0:37:44 | Steps: 9736 | Loss: 66.913915    
Epoch 8 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.552628 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 9 |   Training | Elapsed Time: 0:37:42 | Steps: 9736 | Loss: 66.194061    
Epoch 9 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.298979 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 10 |   Training | Elapsed Time: 0:37:39 | Steps: 9736 | Loss: 73.806196   
Epoch 10 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.055614 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 11 |   Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 73.326642   
Epoch 11 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.570137 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 12 |   Training | Elapsed Time: 0:37:44 | Steps: 9736 | Loss: 73.448959   
Epoch 12 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.844521 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 13 |   Training | Elapsed Time: 0:37:46 | Steps: 9736 | Loss: 73.072213   
Epoch 13 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.568916 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 14 |   Training | Elapsed Time: 0:37:39 | Steps: 9736 | Loss: 72.634806   
Epoch 14 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.015372 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 15 |   Training | Elapsed Time: 0:37:36 | Steps: 9736 | Loss: 72.474703   
Epoch 15 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.031187 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 16 |   Training | Elapsed Time: 0:37:42 | Steps: 9736 | Loss: 71.875592   
Epoch 16 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.453461 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 17 |   Training | Elapsed Time: 0:37:38 | Steps: 9736 | Loss: 71.264576   
Epoch 17 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.374969 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 18 |   Training | Elapsed Time: 0:37:44 | Steps: 9736 | Loss: 70.779961   
Epoch 18 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.202204 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 19 |   Training | Elapsed Time: 0:37:37 | Steps: 9736 | Loss: 70.823069   
Epoch 19 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 109.686278 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 20 |   Training | Elapsed Time: 0:37:39 | Steps: 9736 | Loss: 70.850567   
Epoch 20 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.624760 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 21 |   Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 70.375931   
Epoch 21 | Validation | Elapsed Time: 0:03:50 | Steps: 2087 | Loss: 108.851938 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 22 |   Training | Elapsed Time: 0:37:40 | Steps: 9736 | Loss: 70.420728   
Epoch 22 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 108.534752 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 23 |   Training | Elapsed Time: 0:37:37 | Steps: 9736 | Loss: 69.738112   
Epoch 23 | Validation | Elapsed Time: 0:03:50 | Steps: 2087 | Loss: 108.291176 | Dataset: /content/dev.csv
--------------------------------------------------------------------------------
Epoch 24 |   Training | Elapsed Time: 0:37:43 | Steps: 9736 | Loss: 69.809616   
Epoch 24 | Validation | Elapsed Time: 0:03:49 | Steps: 2087 | Loss: 107.992116 | Dataset: /content/dev.csv

As you can see it seems to be stuck around this loss, what should I do?

If the validation loss isn’t going down, it’s overfitting. You can try fiddling with SpecAugment or increasing dropout. The problem could also be in your data, e.g. it’s too biased towards a single speaker. But without knowing more about it (number of speakers, number of clips, number of hours etc.) it’s difficult to say.

If only the validation loss was not decreasing, it would definitely be over-fitting and I would apply the solutions for those. But if you notice, the training loss is also fluctuating and I don’t understand the reason for that.

Also, about my data, there are about 20 different speakers in it and I have shuffled all the audios so all kinds of them are in training, validation, and test sets.

The test/dev/train sets should be disjoint in terms of speakers. E.g. the same speaker should not appear in any two of them. You can expect some jitteriness in the training because of the SpecAugment.

1 Like

Hey, I have noticed that my model is converging very slowly so I wanted to decrease the learning rate but keep training using my last saved checkpoint.

I’ve seen that the learning rate is loaded from the last checkpoint so does that mean the lr value I give in my command will not have any effect and the last lr value will be used? If I am wrong about something, kindly guide me. thanks.

In principle the one you give in the command should be the one that is used.

1 Like

It might be a good idea to join us on Matrix.

Okay, I didn’t see any difference in speed or so that’s why I asked. After starting training from start again (after removing bias towards one speaker as you pointed out), I do see a difference.

And sure, I’ll check out matrix. Thank so much!

Hey @fty, the link you provided to your models here does not load anymore. Can you see why that is?

Hi there! Unfortunately I ran out of time on the server. I only rented it for one month to do the sprint :slight_smile:

1 Like

I want to fine tune the english pretrained model(0.9.3) on the indian english accents. So, can i use transfer learning for the same checkpoints (of the pretrained model) by using drop_source_layer flag. If not then how can i use transfer learning because my alphabet file is different from the alphabet file of pretrained model?