Fine-Tuning Trained Model to french Dataset


I am new here and I’m finetuning the trained model checkpoint Tacotron2 from the branch 20a6ab3d61.

I am using the french dataset from M-AI-Labs.

The used config file is the one that comes with teh branch 20a6ab3d61, I only changed teh following :

CONFIG[“audio”][‘sample_rate’] = 16000 # to fit my dataset audio
CONFIG[“phoneme_language”] = "fr-fr # french phoneme

CONFIG[“prenet_type”] = “bn” #because of my initial checkpoint
CONFIG[“prenet_dropout”] = True #because of my initial checkpoint

CONFIG[“seq_len_norm”] = True # beacause of the different length of the audios in my dataset . ( I think )

After 5000 steps of training I have the following results :

when I test the model with french short sentence the result is acceptable and I believe it will emprove a lot after maybe another 10 000 steps.
however I think it that the model have a problem with long sentences !

I would be gratefull if someone can help me answer the following quetions :

  • first and most importantly ! what the heck is that green erea in the ground truth !!
    I believe that it is causing the problem with the long sentences , is there any solutions ? ( I checked many simples of my data set with CheckSpectrograms.ipynb and everything seems good there ! )

  • the model Tacotron2 checkpoint_670000.pth.tar that I am finetuning is trained on a dataset with audio of “sample_rate”: 22 050, however my dataset’s audio “sample_rate”: 16 000 ,
    Can that be a problem ?

  • how to set my test sentence for tensorboard in “test_sentences_file” ? ( I tried to create a text file and write it’s path in “test_sentences_file” param but it did not work ) ( I need to manually set the test sentences because I think the model is creating english sentences not french ) ?

  • I am using google colab for the training so I’d like to know if it can effect the results to stop each 11 hours and re-train from the last checkpoint instead of training without interruptions ?

  • The newly created model read number in english with a french accent xD ! any suggestions ?

Any other suggestions or tips are very appreciated !

thanks in advance !!

1 Like

@nmstoker , @carlfm01 from what I have read in this forum, your answers are very interresting and I’d really appreciate it if I can learn from your great experience in the field !


  • The line you see during eval time is padding. You can ignore it. It is not what is creating problems.

  • Not a problem, but you have to set the window lengths in the config according to your dataset.

  • This is how it is done. Try including the absolute path.

  • It is fine to resume training from checkpoints, I do it all the time

  • You need to edit the normalizer in the utils folder.

Good luck!

1 Like

@georroussos thanks a lot, clear and helpfull answers !

what is wierd is that the padding desepiered after another 5000 steps ! and the model is a bit better with long sentences !

But it is too fast and it does not consider commas , is there a way to fix that please ?

5000 steps is too soon. Wait a bit more.


@ilyes_ben_yahia are you unfreezing the entire model or just certain layers? asking since I’m doing something similar with german