I realise the best answer is for me to simply try it, but thought I’d ask those who’d tried it for some general impressions first.
Q. With multi speaker models in TTS, is there any discernible influence from one voice to another? For instance do they start to pick up characteristics from each other or do the voices remain distinct and like the original?
Q. And do the voices yield better quality output than one would get by training just for one of the voices? By this, I mean is the act of training them all together helping the model learn overall common characteristics of language that it applies to all voices?
I’m wondering about whether a multispeaker model with several relatively similar voices would help them get to a better quality because they might somehow reinforce each other or whether it’s actually better to go for quite distinct voices (eg perhaps different accents or speaking patterns) which I’m guessing would generally be less likely to reinforce (if that happens at all?!) but may have some other advantage?