Best place to start with TTS in general

Hey, I was wondering if someone could give me some pointers where to start first? Which Vocoder should I pick? Are they all equally easy/hard to use? Is there a clear ranking which one produces a better output?

Also it might be cool to turn on the discussions on github

To answer these we need more clues about your specific requirements. It is hard to guess.

1 Like

Adding to erogol, if you have some technical expertise, start with any notebook like the Multi-Speaker-Tacotron2 DDC from the wiki to see what inference is like and what components are needed. And give some more information on what you are planning on doing.

1 Like

I assume your question is related to Mozilla TTS. I like as a starting point. Depending on what you want to achieve the remarks added there or the papers contain important informations, like training / inference speed, MOS etc


thank you all for the reply.

@erogol Im a trying to produce high quality text to speech audio. It doesn’t have to be real time, the higher the quality, the better.

@othiele thanks for pointing me towards the wiki on github. I didn’t even see it before. The wiki function of github is so rarely used, that I never look if there is one.

But first of all I need to learn about some more vocabulary about this.

@TheDayAfter the tts-papers repo sounds good, but also a lot of unsorted info. I think I will start with one of the videos.

Tacotron2 + WaveGrad or WaveRNN is what you need

1 Like