Issue regarding cleaners

Immabot · January 12, 2020, 4:33am

Can someone explain what cleaners are and updates to be made for languages other than English(step by step changes)…
Plzz

nmstoker · January 12, 2020, 12:52pm

Have you read the comments in the code and looked through the code?

github.com

mozilla/TTS/blob/master/utils/text/cleaners.py

'''
Cleaners are transformations that run over the input text at both training and eval time.

Cleaners can be selected by passing a comma-delimited list of cleaner names as the "cleaners"
hyperparameter. Some cleaners are English-specific. You'll typically want to use:
  1. "english_cleaners" for English text
  2. "transliteration_cleaners" for non-English text that can be transliterated to ASCII using
     the Unidecode library (https://pypi.python.org/pypi/Unidecode)
  3. "basic_cleaners" if you do not want to transliterate (in this case, you should also update
     the symbols in symbols.py to match your data).
'''

import re
from unidecode import unidecode
from .number_norm import normalize_numbers

# Regular expression matching whitespace:
_whitespace_re = re.compile(r'\s+')

# List of (regular expression, replacement) pairs for abbreviations:

This file has been truncated. show original

You’ll likely get a better sense of what they do by experimenting with inputting some text and seeing what comes out of the functions.

Essentially they’re taking the raw transcript and processing it to take out things the TTS system won’t work with well or to make it easier for it to learn the association between the (processed) text and the audio.

For instance if we look at the abbreviations cleaner it would be quite hard for a system to realise that input of “Mr.” should sound like the sounds of “mister” so one of the cleaners unabbreviates various titles. This one is easy to understand but in fact, depending on the phoneme backend used (such as espeak), you may find this is done for you (ie espeak can handle “Mr Jones” giving the phonemes for “mister Jones”).

Other cleaners do things like normalise the numbers (again as it’s often not obvious how digits relate to what is actually spoken) or filter out characters that have no/limited impact on pronunciation.

As to a step by step list of what to change, it will depend on the language so you will need to use common sense and knowledge of the language but you’d probably want to:

ensure that the characters used by that language (such as any accented characters) got accepted by the cleaning and weren’t removed
see that characters that don’t change pronunciation are removed
consider adding common abbreviations if they’re in your dataset or likely to be submitted by users of the model (assuming your backend doesn’t handle them as mentioned above)

Immabot · January 12, 2020, 8:45pm

Thanks!
I have about 30 hrs of data…
How long should i train to get decent output?

nmstoker · January 12, 2020, 10:03pm

If you use tensorboard you can monitor audio output as it proceeds. Broadly the simplest rule is keep running until it stops improving and it’s best to give it a go.

Actual time to train will depend on your hardware so I can’t really advise, but I’ve typically done training runs of somewhere between one and three days on a 1080Ti but it often has acceptable results somewhat earlier.

Immabot · January 26, 2020, 2:23pm

I’ve trained the model and i have the .pth file.
I tried to use the Benchmark notebook under notebooks folder for testing.
But I’m confused about the paths that we have to specify,especially VOCODER_MODEL_PATH and VOCODER_CONFIG_PATH and where do i get the VOCODER related files…
Can you plzz help?