So i trained Mozilla TTS with Tacotron2 using a custom dataset. The Griffin Lim previews are starting to sound really good, although robotic.
I now want to move on to use the ParallelWaveGAN vocoder.
- How do i go about doing that? Is there a notebook or syntax to run it?
- What are the prerequisits?
- Do i need to train the ParallelWaveGAN model at all, like i did with the TTS model?
I just want to understand the full process before starting to refine my original dataset and possibly retrain my tts model with larger set and better quality audio.
I am stuck at the TTS training stage right now.
Any pointers would be greatly appreciated