Do we need to change symbols when using phonemic text as input?

I am trying to build a model for devnagri(Hindi) TTS . So I was just curious, if we are using the phonemizer library to convert the text(language supported by this module) to phonemes, and setting the use_phonemes in config to be True, does it make any difference to update _characters in the symbols.py accordingly because at the end of the day, we will be giving the phonemic text as input to the model.

Also in the lines of hindi text, few of the characters in devanagri take up 8 bytes , for e.g., के is made with two unicode characters combined “क” and " े"(gets splitted when you print them in python). So would it be fine to dump the devnagari characters into the symbols.py(for both with and without phoneme case of training) and go in with training.

1 Like

anything in utf-8 should work so you can add them to the symbols.py or even you can give your custom char set in your config.json. At the end all chars are mapped to integer ids. If they are not supported you can replace them with random supported characters in your text_cleaner.py

Converting your text using an external phonemizer and setting up symbols before training and directly using our phonemizer (I don’t remember if it supports Hindi) should work the same.

Thanks for replying to the query.

For the second point that you mentioned, my question is a bit different. I checked and the phonemizer library supports hindi. So coming to the question, if we are converting the input text directly to phonemic text which is then being fed to model, do the _characters(defaults to “ABCD…abcd…”) make any significance when using a different language(given you set use_phoneme to True).

Is it that we pass the text through the text_cleaner first and then create the phonemic counterpart or is it that we directly create the phonemic text for that? In former it makes sense to define the _characters, but in later, they may not have any significane.
I Happen to try to understand this from code, but its a bit complicated to understand from such big code.

There are two options. you convert your dataset to phonemes externally and disable internal phoneme training use_phonemes=False and train the model given your phonemes characters to config.json.

Second option is to use the internal phoneme support. If you do that, our code takes each sentence, passes them to text_cleaner and then convert them to phonemes and feed it to the model.

I’d suggest to use the 2nd option if you don’t have any special requirements.

Hope it is clear