Creating a github page for hosting community trained models

I think we can create a GitHub page to host community-driven models (including models I trained) to enable a better distribution.

I wonder to see who would like to share models and help on that?

1 Like

maybe we can incorporate models you trained @synesthesiam

1 Like

Count me in @erogol :slight_smile: .
I’ve already put some german models i know (based on my dataset) on my github page here:

2 Likes

Thank you @erogol for all of the excellent work on MozillaTTS :slight_smile:

Sure, I’d be happy to contribute! …the only problem is that the models I’ve trained are not compatible with upstream MozillaTTS. The vocoders should be fine, however.

I made a few tweaks to have more control over the phonemes. Specifically:

  • A phoneme_backend option that lets me use gruut instead of phonemizer
  • A characters.sort_phonemes boolean that disables phoneme sorting in the text utils
  • A characters.eos_bos_phonemes boolean that disables the addition of EOS/BOS symbols

Mostly, these changes ensure that the characters.phonemes list is preserved in order, and that nothing (besides the pad symbol) is automatically added.

But the use of gruut over phonemizer is probably going to be a show stopper for most people. Phonemes in gruut come from pre-built dictionaries or pre-trained grapheme-to-phoneme models, which lets me do some neat things like apply accents to voices. It also does tokenization, text cleaning, and number/currency expansion with the help of Babel and num2words.

Let me know how I can help :+1:

1 Like

Hi @synesthesiam - you suggested gruut might be a “show stopper for most people”. How does it compare with using phonemizer with espeak-ng as a backend?

Is one of the concerns that it doesn’t have such broad language coverage?

If the phoneme symbols are consistent then presumably people can switch back and forth between it and phonemizer to see how it compares - I would be interested to give that a go, is there anything I should bear in mind when trying it?

Happy to move this discussion into a separate thread if that’s better.

1 Like

Both gruut and phonemizer produce IPA, but gruut uses pre-built lexicons and g2p models. I haven’t tested how consistent the IPA is between the two, but I’d expect it to be pretty good for U.S. English (gruut’s U.S. English phoneme inventory is here).

For me, an important feature of gruut is that a word can have multiple pronunciations. “read”, for example, has both /ɹɛd/ (like “red”) and /ɹiːd/ (like “reed”). You can get the second pronunciation in gruut with “read_2” in your text (with word indexes enabled).

Thanks! Gruut has two stages: (1) tokenization and (2) phonemization. The command-line tool takes text in the first stage and produces JSON for the second. You can skip the first stage if you know exactly what you want:

$ echo '{ "clean_words": ["this", "is", "a", "test"] }' | bin/gruut en-us phonemize | jq .pronunciation
[
  [
    "ð",
    "ɪ",
    "s"
  ],
  [
    "ɪ",
    "z"
  ],
  [
    "ə"
  ],
  [
    "t",
    "ɛ",
    "s",
    "t"
  ]
]

Might be a good idea :slight_smile: I’d be interested in feedback for non-English languages especially.

1 Like

I guess to accommodate your models first we need to enable Gruut in TTS.

But maybe gruut and phonemizer generated the same outputs or at least they use the same IPA characters. In that case, we can replace Gruut with phonemizer to kick start your models in TTS.

1 Like

Thanks, @erogol

As I get better at training models too, I’d also be happy to some phonemizer-based models for the community.

I have a 2080 Ti and 3x1060’s (6GB). Any tips on how I might get models trained as fast as possible?

1 Like

two tricks for training faster

  1. Using gradual training
  2. Using mixed_precision (as in dev branch) which would fit more instances in a batch.
  3. Finetunning the model you trained before.
1 Like