Fine-Tuning Trained Model to french Dataset

ilyes_ben_yahia · June 25, 2020, 10:08am

Hello,

I am new here and I’m finetuning the trained model checkpoint Tacotron2 from the branch 20a6ab3d61.

I am using the french dataset from M-AI-Labs.

The used config file is the one that comes with teh branch 20a6ab3d61, I only changed teh following :

CONFIG[“audio”][‘sample_rate’] = 16000 # to fit my dataset audio
CONFIG[“phoneme_language”] = "fr-fr # french phoneme

CONFIG[“prenet_type”] = “bn” #because of my initial checkpoint
CONFIG[“prenet_dropout”] = True #because of my initial checkpoint

CONFIG[“seq_len_norm”] = True # beacause of the different length of the audios in my dataset . ( I think )

After 5000 steps of training I have the following results :

when I test the model with french short sentence the result is acceptable and I believe it will emprove a lot after maybe another 10 000 steps.
however I think it that the model have a problem with long sentences !

I would be gratefull if someone can help me answer the following quetions :

first and most importantly ! what the heck is that green erea in the ground truth !!
image|404x367
I believe that it is causing the problem with the long sentences , is there any solutions ? ( I checked many simples of my data set with CheckSpectrograms.ipynb and everything seems good there ! )
the model Tacotron2 checkpoint_670000.pth.tar that I am finetuning is trained on a dataset with audio of “sample_rate”: 22 050, however my dataset’s audio “sample_rate”: 16 000 ,
Can that be a problem ?
how to set my test sentence for tensorboard in “test_sentences_file” ? ( I tried to create a text file and write it’s path in “test_sentences_file” param but it did not work ) ( I need to manually set the test sentences because I think the model is creating english sentences not french ) ?
I am using google colab for the training so I’d like to know if it can effect the results to stop each 11 hours and re-train from the last checkpoint instead of training without interruptions ?
The newly created model read number in english with a french accent xD ! any suggestions ?

Any other suggestions or tips are very appreciated !

thanks in advance !!

ilyes_ben_yahia · June 25, 2020, 2:02pm

@nmstoker , @carlfm01 from what I have read in this forum, your answers are very interresting and I’d really appreciate it if I can learn from your great experience in the field !

georroussos · June 25, 2020, 10:23pm

Welcome!

The line you see during eval time is padding. You can ignore it. It is not what is creating problems.
Not a problem, but you have to set the window lengths in the config according to your dataset.
This is how it is done. Try including the absolute path.
It is fine to resume training from checkpoints, I do it all the time
You need to edit the normalizer in the utils folder.

Good luck!

ilyes_ben_yahia · June 26, 2020, 3:06pm

@georroussos thanks a lot, clear and helpfull answers !

what is wierd is that the padding desepiered after another 5000 steps ! and the model is a bit better with long sentences !

But it is too fast and it does not consider commas , is there a way to fix that please ?

georroussos · June 26, 2020, 3:16pm

5000 steps is too soon. Wait a bit more.

kjk11 · August 17, 2020, 9:08pm

@ilyes_ben_yahia are you unfreezing the entire model or just certain layers? asking since I’m doing something similar with german

Topic		Replies	Views
Fine-Tuning Trained Model to New Dataset TTS (Text-to-Speech)	13	4910	August 22, 2019
Training suddenly dropping in quality TTS (Text-to-Speech)	20	2432	August 18, 2020
Results of a model for my native language TTS (Text-to-Speech)	1	498	July 15, 2020
Fine-tuning Tacotron2 to new language TTS (Text-to-Speech)	2	3458	August 18, 2020
Fine-Tuning VCTK Model destroys quality TTS (Text-to-Speech)	5	1868	February 23, 2021

Fine-Tuning Trained Model to french Dataset

Related topics