Is it possible to train a TTS model in a custom language (Latin) with only a couple hours of good quality training data

So I am completely new to ML and TTS, and am trying to learn how to create text to speech for right now maybe 20-30 phrases, but train it more as I get more data. Is this feasible in a custom language? My current track of steps is,

  1. creating a LJSpeech type dataset with my 20-30 phrases
  2. I am going to skip adding my custom slphabet since I plan on using phonemes for training
  3. Writing a text cleaner
  4. How do I figure out how to set the parameters?

20-30 phrase is way too small. You would need at least 100 samples and a very good model pretrained on the same language.

Would I be able to use 100 Latin phrases with the LJSpeech model? Or can that only be used with english

Probably not, but you can try

I found the strategy used in Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis very interesting to improve Tacotron’s data efficiency. As long as you can find text and audio corpora for Latin, it can be a good starting point.

1 Like