Best place to start with TTS in general

Zarazas · December 20, 2020, 10:27pm

Hey, I was wondering if someone could give me some pointers where to start first? Which Vocoder should I pick? Are they all equally easy/hard to use? Is there a clear ranking which one produces a better output?

Also it might be cool to turn on the discussions on github

erogol · December 21, 2020, 9:36am

To answer these we need more clues about your specific requirements. It is hard to guess.

othiele · December 21, 2020, 9:46am

Adding to erogol, if you have some technical expertise, start with any notebook like the Multi-Speaker-Tacotron2 DDC from the wiki to see what inference is like and what components are needed. And give some more information on what you are planning on doing.

TheDayAfter · December 21, 2020, 10:07am

I assume your question is related to Mozilla TTS. I like https://github.com/erogol/TTS-papers as a starting point. Depending on what you want to achieve the remarks added there or the papers contain important informations, like training / inference speed, MOS etc

Zarazas · December 21, 2020, 8:55pm

thank you all for the reply.

@erogol Im a trying to produce high quality text to speech audio. It doesn’t have to be real time, the higher the quality, the better.

@othiele thanks for pointing me towards the wiki on github. I didn’t even see it before. The wiki function of github is so rarely used, that I never look if there is one.

But first of all I need to learn about some more vocabulary about this.

Zarazas · December 21, 2020, 10:46pm

@TheDayAfter the tts-papers repo sounds good, but also a lot of unsorted info. I think I will start with one of the videos.

erogol · December 21, 2020, 9:54pm

Tacotron2 + WaveGrad or WaveRNN is what you need