How to use the TTS models

I fully understand that the model is incomplete. However, I want to try using one of the pre-generated models for generating audio. The issue is that, as a individual who has never used a model like this, (although I have played around with other TTS systems while I was still on windows,) I have absolutely no idea how to actually use the darn thing. I was wondering if there was a easy way to just use the pre-trained model.

Additionally, since if I understand right the models take ASCII input, does anyone have a good down-converter to go from SSML to ASCII? Or is there just a Python API to generate speech.

If it’s not possible to do it currently that’s fine, I would like to know if there is a way to get notified when it reaches that point.

Hello @Black-Kitsune-Gold-Tail welcome to the forum :slight_smile:

With the README being smartened up a bit recently it might not be so immediately obvious how to do what you’re asking, but luckily the info is there, so here are a few pointers:

  1. You need to download a trained model: there are links to those here (this used to be on the README):
    Depending on which of Tacotron or Tacotron2 you want to try, I’d go with one of the last two in the table

  2. Then refer to the methods for testing a model here:

I’d suggest if you want some continuous use that you try the Demo server

With these pointers you’ll still need to do a bit of digging around, so if you’re not happy setting up python environments, looking through code and GitHub issues you might struggle but it should be fairly straightforward if you’re not a complete beginner. Best of luck!

I’m trying to get one of the pre-trained models going without much luck, in the same spirit of the OP. I’ve tried a few different configurations to no avail. I setup a project that would be a good starting point for a reproducible nvidia-docker build that would not be dependent on a local configuration, and was hoping to perhaps get some feedback on getting it to work without producing runtime errors -

I found the colab notebook in the corresponding Github issue here to be helpful.