Warnings appears during training related to phonemizer on Arabic language

khalilrhouma · December 4, 2020, 9:48am

Working branch : dev

I am trying to fine-tune Tacotron-DDC-130k on LJSpeech on the Arabic dataset using the phonemizer. Arabic is supported under ar tag.
During the training, these warnings didn’t stop for the entire epoch:

[WARNING] extra phones may appear in the "ar" phoneset
[WARNING] language switch flags have been removed (applying "remove-flags" policy)
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "ar" phoneset
[WARNING] language switch flags have been removed (applying "remove-flags" policy)
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "ar" phoneset
[WARNING] language switch flags have been removed (applying "remove-flags" policy)
[WARNING] 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "ar" phoneset
[WARNING] language switch flags have been removed (applying "remove-flags" policy)
[WARNING] 1 utterances containing language switches on lines 1
............

Where the problem came from? I mean is there anything to change before training except these parameters:

text_cleaner: phoneme_cleaners
use_phonemes: true
phoneme_language: ar

I think it may be related to text cleaning Any help?
Thank you

othiele · December 4, 2020, 10:50am

You also have that as an issue, please close the issue if you continue here. Otherwise it is hard for people to follow.

Just to understand, you are using a 130k TacoDDC trained on LJSpeech English and you want to transfer that to Arabic with how much material?

The warning states that there is a mix of languages (not ar in this case) in your input. And the flags (e.g. <en>...</en>) for that have been removed. Do you have mixed material?

But @mrthorstenm and @repodiac are the experts on that. Post more and we might be able to help you.

khalilrhouma · December 4, 2020, 11:10am

Hello @othiele
yes I posted here because I didn’t get a replay for the issue , I close it now as you told me.
for the dataset it is about 3h, this is a sample of diacritic Arabic which is supported by the phonemizer:

إِضَافَةً إِلَى عَرْضِ أَنْوَاعٍ عَدِيدَةٍ مِنَ الْأَثَاثِ الْمُزَخْرَفِ وَالصَّنَادِيقِ وَالْأَوَانِي وَالْأَدَوَاتِ - وَالْأَغْطِيَةِ وَاللَّوْحَاتِ الْفَنِّيَّةِ الْعَاكِسَةِ لِتِقْنِيَّاتٍ مُدْهِشَةٍ - تُجَسِّدُ مَهَارَاتٍ حِرَفِيَّةً عَرَفَتْهَا الْأَنْدُلُسْ

we tested the phonemizer on the Arabic sentence directly and the phonemizer didn’t output any warning. This is the phonemizer output:

ʔidˤa.ːfatan ʔilaː ʕardˤi. ʔanwaːʕin ʕadiːdatin mina alʔaθaːθi almuzaχɹafi was̪ːnaːdiːqi walʔauaːniː walʔadauaːti walʔaɣt̪i.ːati wallwħaːti alfanniːti alʕaːkisati litiqniːaːtin mudhiʃatin tudʒassdu mahaːɹaːtin ħiɹafiːtan ʕaɹafathaː alʔandulus

which is correct without any error or warnings, that’s why I am thinking it is related to text cleaning form mozilla tts

othiele · December 4, 2020, 12:51pm

Put in some logging where TTS uses the phonemizer and find out what utterances cause the messages. For us we then knew what we had to change.

erogol · December 4, 2020, 2:21pm

I’d suggest to run phonemizer over all of your dataset, list unique phones and compare it with the phoneme set under symbols.py. If there are extras, you can add them there and send a PR for us to update it.

khalilrhouma · December 4, 2020, 4:59pm

@erogol
We tested the whole dataset with phonemizer only, it gives it’s output as expected without any warnings.
Actually, the problem was related to phonemizer_cleaner , and exactly when the ASCII conversion happens: text = convert_to_ascii(text)
Phonemizer accepts the Arabic letters as it is , and when you make that conversion it does change the letters completely to an other format which is not Arabic.
So, for now we are using basic_cleaners in place of phoneme_cleaners and the warnings gone (until this moment)
Is there any method to disable cleaners ?

erogol · December 4, 2020, 5:08pm

you can pass a empty function which only returns the value as it is.

mrthorstenm · December 4, 2020, 6:42pm

Hello @khalilrhouma.

Didn’t have time to answer more early.
Here are some info what we did in german phoneme cleaning.

We appended following lines in the end of “TTS/TTS/tts/utils/text”

def german_phoneme_cleaners(text):
    return GermanTransliterate(replace={';': ',', ':': ' '}, sep_abbreviation=' -- ').transliterate(text)

Maybe that’s the place where you could make your own function returning the original value.
Reference your method name in your config.json as we did:

"text_cleaner": "german_phoneme_cleaners",

We had some discussion on phoneme topic in another thread, maybe this is helpful for you.

khalilrhouma · December 5, 2020, 7:15am

Thank you all for your support and your time.
As @erogol said, we just need a function that returns the input as it is. The basic_cleaners function seems to do that properly, and we check that it gives the input as it is without issues.
So thanks again, I really appreciate your work in Mozilla TTS

Pak · February 10, 2021, 12:34pm

Hello,
Can you share your pipeline for training arabic? have own DB, dont need data.
Thanks you.

erogol · February 10, 2021, 1:24pm

so did it work? How you got your Arabic model?

Pak · February 11, 2021, 6:03am

I install espeak-ng to get “ar” support, but get “xcb_connection_has_error() returned true” so fix by make “unset DISPLAY” now train. Will report result.

erogol · February 11, 2021, 1:12pm

Google is your friend. Not related to TTS.