Multispeaker development progress

Aha, but that is what I tried. Then I checked the code and I saw that, if the multispeaker embeddings option is enabled in the config, it checks if I also gave it --restore_path and if I did, it checks the .json file. But I will definitely try again. Which model would you recommend? I think the one trained on ForwardAttn and fine-tuned on BN would be a good candidate. But would I keep on training it with BN? And also, keep the config file from it?

I integrated speaker embeddings by editing Tacotron2.py in models/tacotron2.py. I changed the condition to if num_speakers > 0 (I know it is redundant), and then initiated a torch.FloatTensor variable, which included my embeddings. I created a lookup table torch.nn.Embedding.from_pretrained(weight) and froze the layer with self.speaker_embedding.weight.requires_grad = False. Something like this:

Then I think I finetuned the LJSpeech model for a while to include the embeddings (or not, really do not remember), and called it during inference time with --speaker_id 0. It is a hacky way, but the embeddings did load and did change the prosody, as we saw.