Tacotron2 Fine-tuning: Missing Layers and No Output Sound

Nosson_Weissman · December 5, 2023, 2:22pm

I’m attempting to use TTS to fine tune a Tacotron2 TTS model. If it makes a difference, I’m using Python 3.9.1 and I’m fine-tuning the latest tts_models--en--ljspeech--tacotron2-DDC.

During the fine-tuning process, when I load the pretrained model the system throws errors indicating Layer missing in the checkpoint. Then it says

| > 81 / 105 layers are restored.
 > Model restored from step 278000

 > Model has 47669492 parameters

 > Number of output frames: 2

I’ve checked to see that I’m using the correct model version as per TTS docs and everything else seems to be in order.

Training technically does execute… but the wav file synthesized by the fine-tuned model doesn’t contain any audible sound.

Any ideas or suggestions are welcome.

Here’s the exact command-line code I’m using:

CUDA_VISIBLE_DEVICES="0" python ./TTS/recipes/ljspeech/tacotron2-DDC/train_tacotron_ddc.py --restore_path ../.local/share/tts/tts_models--en--ljspeech--tacotron2-DDC/model_file.pth --config_path ../.local/share/tts/tts_models--en--ljspeech--tacotron2-DDC/config.json