Hey, I was wondering if someone could give me some pointers where to start first? Which Vocoder should I pick? Are they all equally easy/hard to use? Is there a clear ranking which one produces a better output?
Also it might be cool to turn on the discussions on github
Adding to erogol, if you have some technical expertise, start with any notebook like the Multi-Speaker-Tacotron2 DDC from the wiki to see what inference is like and what components are needed. And give some more information on what you are planning on doing.
I assume your question is related to Mozilla TTS. I like https://github.com/erogol/TTS-papers as a starting point. Depending on what you want to achieve the remarks added there or the papers contain important informations, like training / inference speed, MOS etc
@erogol Im a trying to produce high quality text to speech audio. It doesn’t have to be real time, the higher the quality, the better.
@othiele thanks for pointing me towards the wiki on github. I didn’t even see it before. The wiki function of github is so rarely used, that I never look if there is one.
But first of all I need to learn about some more vocabulary about this.