Issue regarding cleaners

Can someone explain what cleaners are and updates to be made for languages other than English(step by step changes)…

Have you read the comments in the code and looked through the code?

You’ll likely get a better sense of what they do by experimenting with inputting some text and seeing what comes out of the functions.

Essentially they’re taking the raw transcript and processing it to take out things the TTS system won’t work with well or to make it easier for it to learn the association between the (processed) text and the audio.

For instance if we look at the abbreviations cleaner it would be quite hard for a system to realise that input of “Mr.” should sound like the sounds of “mister” so one of the cleaners unabbreviates various titles. This one is easy to understand but in fact, depending on the phoneme backend used (such as espeak), you may find this is done for you (ie espeak can handle “Mr Jones” giving the phonemes for “mister Jones”).

Other cleaners do things like normalise the numbers (again as it’s often not obvious how digits relate to what is actually spoken) or filter out characters that have no/limited impact on pronunciation.

As to a step by step list of what to change, it will depend on the language so you will need to use common sense and knowledge of the language but you’d probably want to:

  • ensure that the characters used by that language (such as any accented characters) got accepted by the cleaning and weren’t removed

  • see that characters that don’t change pronunciation are removed

  • consider adding common abbreviations if they’re in your dataset or likely to be submitted by users of the model (assuming your backend doesn’t handle them as mentioned above)

I have about 30 hrs of data…
How long should i train to get decent output?

If you use tensorboard you can monitor audio output as it proceeds. Broadly the simplest rule is keep running until it stops improving and it’s best to give it a go.

Actual time to train will depend on your hardware so I can’t really advise, but I’ve typically done training runs of somewhere between one and three days on a 1080Ti but it often has acceptable results somewhat earlier.

I’ve trained the model and i have the .pth file.
I tried to use the Benchmark notebook under notebooks folder for testing.
But I’m confused about the paths that we have to specify,especially VOCODER_MODEL_PATH and VOCODER_CONFIG_PATH and where do i get the VOCODER related files…
Can you plzz help?