Fine-tuning Tacotron2 to new language


I’m currently trying to fine-tune Tacotron2 (which was trained from LJSpeech originally) for German, but the training takes about an hour per epoch and the alignment is improving slowly/not at all.

I’ve been unfreezing the entire model and re-training it, but should I be only unfreezing the postnet or some other combination of modules? Please let me know!

Hi @kjk11 - normally there isn’t any need for making decisions regarding unfreezing layers, since to fine-tune you simply follow the instructions here: with your updated config file.
The only point to bear in mind is that the directory structure changed in the dev branch recently so the commands given in the wiki need a minor adjustment for the directories. The flag of --restore_path is the one you’d normally use.

The other thing if it’s slow is to double check that you definitely have it using CUDA and the GPU correctly. The output right after training starts should confirm this (I had the wrong device ID once and sat there for a while thinking it was rather slow compared to before!)

I’ve got to say I haven’t looked seriously at fine tuning between languages, so maybe you’re unfreezing has more merit then but I’d probably start by trying it the way it’s documented unless you’ve been advised to try this approach here and I’ve simply missed it.

Hello @kjk11,
welcome to our community :slight_smile:
I’ve no idea on training on a language switch, but since you’re talking about a german model maybe this is helpful for you.

Download of my german dataset:

The model training is still in progress (as you can see in the linked thread), but maybe that’s a point to think about before starting a cross-language training.

1 Like