Transfer learning by training only the last few layers

rpratesh · November 14, 2018, 8:01am

Hi,

I wanted to train the provided deepspeech 0.3 checkpoints with my own recorded audio files.
For now, I want to train only the last few layers (like fully connected layers and last few RNN layers) while freezing the rest of layers. What changes are needed in code to do that.

Also, for an advice, Is it better to train only the last few layers or the entire network, provided your dataset is just another accent of English language itself with few extra words that were not earlier in vocabulary.

Thanks

rpratesh · November 21, 2018, 6:40am

Any suggestions on this @lissyx @Tilman_Kamp !

lissyx · November 21, 2018, 7:09am

Joshua is already working on that, but it’s still early: https://github.com/mozilla/DeepSpeech/tree/transfer-learning

rpratesh · November 21, 2018, 2:59pm

@lissyx Can you tag Joshua’s handle in this thread. So that we can have discussion on this topic here

josh_meyer · November 21, 2018, 9:29pm

Hey there, @rpratesh!

Try checking out that branch that @lissyx mentioned (transfer-learning).

You will need to add the following files for your new accent:

DeepSpeech/
       data/
             alphabet.txt
             lm/
                   lm.binary
                   trie

(1) Use util/check_characters.py to find the unique characters in your new accent transcripts.

(2) Use util/lm.sh from the transfer-learning branch to generate the trie & lm.binary files https://github.com/mozilla/DeepSpeech/commit/695d132c95a9d3f89cc7bb8f7e1ec1323a08a6e3#diff-f94859e8f7dad3173a79bcf73b14032f

(3) train with DeepSpeech.py, making sure the paths match up to your local data. I’d try replacing the top layer and maybe the second layer. Going down deeper probably won’t make sense for just accent-transfer.

rpratesh · November 29, 2018, 9:31am

Hi,
I’ve checked out “transfer-learning” branch and generated my own lm.binary, trie files.
I’ve trained with two extra parameters i.e.

–drop_source_layers 2 --source_model_checkpoint_dir “$MyCkptDir”

Are the above two parameters correct and enough, for the purpose of transfer learning by unfreezing only the last two layers

And, the model seems to have converged quickly and is performing well with my test dataset. Though, I’ve a feeling that it has over-fitted, because it works well with similar sentences that I’ve added to vocabulary (though from a completely new speaker) , but doesn’t work well if the same speaker talks entirely new sentences.
Any suggestion on what I can do to improve this.

josh_meyer · December 18, 2018, 4:59am

Try early stopping with a shorter window

rpratesh · January 30, 2019, 1:54pm

hi @josh_meyer

If we look into Deepspeech’s architecture, it has 3 FC layers, then one BiLSTM layer followed by one FC layer.
So, If I am training last two layers using the Transfer-learning branch, that @josh_meyer has mentioned, would that modify the weights of even the BiLSTM layer ('coz that’s the last-but-one layer as per the arch.)

Also @josh_meyer @lissyx @kdavis,
Can you suggest any better training method such that it retains the American accent but performs well on other accents too.
Problem I am facing is that, If I train the last one/two layers with just Indian accent, it’s forgetting the previous accents.

rpratesh · February 1, 2019, 9:29am

Hi,
any help regarding this query!

reuben · February 1, 2019, 11:06am

The terminology in the transfer learning branch includes the output layer in the count, so last two means the output layer and final hidden layer.

rpratesh · February 1, 2019, 11:23am

Output layer means CTC decode layer?

reuben · February 1, 2019, 11:43am

It means the last layer in the model, the one that has the same size as your alphabet.

sayantangangs.91 · March 9, 2019, 2:53am

Hey @rpratesh how has transfer learning worked? I’m stuck using checkpoint and training henceforth. That is causing some problems. Looking forward to hearing from you.

kondaraunak · March 30, 2019, 8:02am

Sir, I am trying to train the existing model from released checkpoints-0.4.1. I have my own data and want to continue training from these checkpoints.
But my training shows that it is resuming from different epoch number like 10232 for one dataset, 45000 for another dataset, 1832 for another dataset
(actually it should have started from 31 as the model is trained for 30 epochs as written on releases page)

Any idea about this sir?

And also how to use your transfer learning repository? can you please elaborate?