Best place to start with TTS in general

Hey, I was wondering if someone could give me some pointers where to start first? Which Vocoder should I pick? Are they all equally easy/hard to use? Is there a clear ranking which one produces a better output?

Also it might be cool to turn on the discussions on github

To answer these we need more clues about your specific requirements. It is hard to guess.

Adding to erogol, if you have some technical expertise, start with any notebook like the Multi-Speaker-Tacotron2 DDC from the wiki to see what inference is like and what components are needed. And give some more information on what you are planning on doing.

I assume your question is related to Mozilla TTS. I like as a starting point. Depending on what you want to achieve the remarks added there or the papers contain important informations, like training / inference speed, MOS etc


thank you all for the reply.

@erogol Im a trying to produce high quality text to speech audio. It doesn’t have to be real time, the higher the quality, the better.

@othiele thanks for pointing me towards the wiki on github. I didn’t even see it before. The wiki function of github is so rarely used, that I never look if there is one.

But first of all I need to learn about some more vocabulary about this.

@TheDayAfter the tts-papers repo sounds good, but also a lot of unsorted info. I think I will start with one of the videos.

Tacotron2 + WaveGrad or WaveRNN is what you need

