Transfer learning by training only the last few layers


(Rpratesh) #1

Hi,

I wanted to train the provided deepspeech 0.3 checkpoints with my own recorded audio files.
For now, I want to train only the last few layers (like fully connected layers and last few RNN layers) while freezing the rest of layers. What changes are needed in code to do that.

Also, for an advice, Is it better to train only the last few layers or the entire network, provided your dataset is just another accent of English language itself with few extra words that were not earlier in vocabulary.

Thanks


(Rpratesh) #2

Any suggestions on this @lissyx @Tilman_Kamp !


(Lissyx) #3

Joshua is already working on that, but it’s still early: https://github.com/mozilla/DeepSpeech/tree/transfer-learning


(Rpratesh) #4

@lissyx Can you tag Joshua’s handle in this thread. So that we can have discussion on this topic here


#5

Hey there, @rpratesh!

Try checking out that branch that @lissyx mentioned (transfer-learning).

You will need to add the following files for your new accent:

DeepSpeech/
       data/
             alphabet.txt
             lm/
                   lm.binary
                   trie

(1) Use util/check_characters.py to find the unique characters in your new accent transcripts.

(2) Use util/lm.sh from the transfer-learning branch to generate the trie & lm.binary files https://github.com/mozilla/DeepSpeech/commit/695d132c95a9d3f89cc7bb8f7e1ec1323a08a6e3#diff-f94859e8f7dad3173a79bcf73b14032f

(3) train with DeepSpeech.py, making sure the paths match up to your local data. I’d try replacing the top layer and maybe the second layer. Going down deeper probably won’t make sense for just accent-transfer.


(Rpratesh) #6

Hi,
I’ve checked out “transfer-learning” branch and generated my own lm.binary, trie files.
I’ve trained with two extra parameters i.e.

–drop_source_layers 2 --source_model_checkpoint_dir “$MyCkptDir”

Are the above two parameters correct and enough, for the purpose of transfer learning by unfreezing only the last two layers

And, the model seems to have converged quickly and is performing well with my test dataset. Though, I’ve a feeling that it has over-fitted, because it works well with similar sentences that I’ve added to vocabulary (though from a completely new speaker) , but doesn’t work well if the same speaker talks entirely new sentences.
Any suggestion on what I can do to improve this.