Multispeaker development progress

georroussos · March 24, 2020, 12:14pm

Aha, but that is what I tried. Then I checked the code and I saw that, if the multispeaker embeddings option is enabled in the config, it checks if I also gave it --restore_path and if I did, it checks the .json file. But I will definitely try again. Which model would you recommend? I think the one trained on ForwardAttn and fine-tuned on BN would be a good candidate. But would I keep on training it with BN? And also, keep the config file from it?

I integrated speaker embeddings by editing Tacotron2.py in models/tacotron2.py. I changed the condition to if num_speakers > 0 (I know it is redundant), and then initiated a torch.FloatTensor variable, which included my embeddings. I created a lookup table torch.nn.Embedding.from_pretrained(weight) and froze the layer with self.speaker_embedding.weight.requires_grad = False. Something like this:

Then I think I finetuned the LJSpeech model for a while to include the embeddings (or not, really do not remember), and called it during inference time with --speaker_id 0. It is a hacky way, but the embeddings did load and did change the prosody, as we saw.

Topic		Replies	Views
My Success with Mozilla TTS TTS (Text-to-Speech)	7	7105	January 21, 2021
Speaker encoder used with release MultiTTS models TTS (Text-to-Speech)	1	813	November 20, 2020
Advanced TTS Techniques TTS (Text-to-Speech)	2	3248	June 18, 2021
Noob need help with Mozilla TTS TTS (Text-to-Speech)	3	967	August 26, 2020
Pretrained model for Multiple Speaker Embedding TTS (Text-to-Speech)	1	566	September 5, 2019

Multispeaker development progress

Related topics