SV2TTS support

While looking into CorentinJ’s SV2TTS implementation, I came across a comment where he mentions SV2TTS is actually implemented in Mozilla TTS.

Specifically he mentions that @erogol used parts of his code for implementation in Mozilla TTS:

Last I checked, erogol had a lot of features from different papers implemented, including sv2tts. In fact he’s even copied some code from my repo.

However, I cannot find any reference to an SV2TTS implementation in the Mozilla TTS repo.

Does anyone know more about whether SV2TTS is currently supported?

Yes we have speaker encoder you can use for that but I DID NOT copied his code.

Yes we have speaker encoder

Thanks for your fast reply!

It looks like the speaker encoder is an implementation of this paper:

However, I was referring to this paper instead. Is that something you support?

We don’t have a direct implementation yet. But @edresson works on it actively. You can check his fork

Great, thanks! I see that the last commit on that repo was a year ago, so if you want some help to get restarted, just let me know @edresson1

This is the active branch.

Edresson has a notebook that can be used to extract embeddings using the encoder from ContentinJ and use them for training (using Edresson’s repo). You can check it out, it works nicely :grinning: