Portuguese TTS model

I’ve seen this model from Mozilla TTS from Edresson:

Does anybody here know how to perform the training?

It seems that he has used “transfer learning”. Can anyone point to more information on this?

1 Like

I’m guessing @edresson1 is best placed to comment.

I had a very quick peak in his repo just now but didn’t have time to look closely enough to tell how obvious the transfer training details are there.

@kms just curious: are you interested in this for Portuguese or an application to another language? And if so do you have a dataset in mind?

Hi, thanks for the interest :),

I used my dataset called TTS-Portuguese Corpus (https://github.com/Edresson/TTS-Portuguese-Corpus)

The transfer of learning is nothing more than to load the pre-trained model of English and continue training in Portuguese. As the Alphabet is different I made sure that the texts in my dataset were all in lowercase letters.

After I changed the vocabulary here: https://github.com/Edresson/TTS/blob/TTS-Portuguese/utils/text/symbols.py

Basically I substituted capital letters with letters that have an accent in Portuguese.

I recommend using the current implementation and doing something similar as I did to make it possible to use Transfer Learning.

In the past it was not possible to make a model with vocabulary other than English available in the official repository so my model is on a fork.
But now the vocabulary can be provided in config.json and that is possible.


Have you tried to do it from scratch, without transfer learning?

yes, the results are close, but with transfer learning the results are better, and less time is needed.

my model is old, I recommend training Tacotron2 with Graves Attention and prenet BN.

That is very interesting. Were you able to get good results using a pretrained model from a different language then? Because I tried the same (I loaded the pretrained on LJSpeech weights when initiating a training session for Swedish) and I literally got nothing. Although I guess Swedish with English is much more different than English and Portuguese in terms of pronunciation.

Yes, I managed to reduce the training time with transfer learning from another language. For more details see my paper End-To-End Speech Synthesis Applied to Brazilian Portuguese

Experiment 6.1 and 6.2 are trained without transfer learning, however experiment 6.3, 6.4, 7.1, 7.2 are trained with transfer learning and this improved the model’s results. Unfortunately, it was not possible to calculate the MOS for all experiments.