Contributing my german voice for tts

I’m currently preparing this.

@erogol that’s the config we used for last training after experimenting with lots of config parameters. In this one we’d just the issue on forget to set german phoneme cleaners by @repodiac

config.zip (3,9 KB)

Just in case that it is of help - I can provide the dataset normalized with pyloudnorm to -24dB.

do_sound_norm option corrected the problem a bit. I finished the Tacotron2 training, and now started PWGAN training as a vocoder.

I did the same mistake. Now the models does not work for numbers written in digits. This can be solved in runtime by using a german number normalization by an external library.

Please have a look at the german_transliterate library: https://github.com/repodiac/german_transliterate

  • sed '/import re/a from german_transliterate.core import GermanTransliterate' cleaners.py >> cleaners-new.py
  • mv cleaners-new.py cleaners.py
  • echo "\ndef german_phoneme_cleaners(text):" >> cleaners.py
  • echo "\treturn GermanTransliterate(replace={';': ',', ':': ' '}, sep_abbreviation=' -- ').transliterate(text)" >> cleaners.py

could help then :slight_smile:

… and everything else necessary is wrapped up as Docker container here: https://github.com/repodiac/tit-for-tat

thx I’ll take a look after I train the vocoder.

Here is v0.1 results

1 Like

I’ve added the recipe https://github.com/erogol/TTS_recipes

3 Likes

just my 2 cents… as a state-of-the-art way of dealing with these rather complicated setups… I would welcome and recommend using Docker containers instead. Such as training and inference becomes a “turn-key solution” without much or even any manual fiddling.

As an example or blueprint I have put @mrthorstenm setup into a Dockerfile as he mentioned above: https://github.com/repodiac/tit-for-tat/blob/master/thorsten-TTS/Dockerfile

2 Likes

This is great. Pleased to see donators. I wish I could donate my systematic voice one day!

Created a notebook https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials

2 Likes

Dataset “thorsten” is now available in version 02.

Improvements over version 01:

  • normalized to -24dB
  • split metadata.csv into shuffeled metadata_train.csv and metadata_val.csv

For compatibility reasons i’ll keep version 1 available.

See my github page for dataset details and download url.

1 Like

Hi guys, just to let you know if you are interested. Feel free to check out my recent upload of https://github.com/repodiac/espeak-ng_german_loan_words - it is a brief tutorial with code where you can automatically create an additional dictionary for espeak-ng with ~10k German loan words.

These may improve TTS preprocessing when using phonemes (because loan words are correctly pronounced then, not “german-ized”)

1 Like

@erogol @mrthorstenm Many thanks for computing respective donating. I have several questions:

  1. Below you see the performance on a PC with quadcore i5 CPUs turning CUDA off, two out of four cores seem to be used during computation

    sentence = “In Deutschland starben bislang zwar weniger Menschen an Covid-19 als etwa in Belgien oder Großbritannien. Eine neue Studie zeigt jedoch: Bei Patienten, die ins Krankenhaus mussten, sind die Verläufe überall ähnlich.”
    align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, use_cuda, ap, use_gl=False, figures=True)

     (333312,)
      > Run-time: 36.56012582778931
      > Real-time factor: 2.4186008842661977
      > Time per step: 0.00010968712780813468
    

How can i improve this?

  1. Can this kind of stuff run in realtime on a Jetson Nano?

  2. Remark: The word ‘mussten’ was pronounced as ‘müssten’ and ‘Verläufe’ as ‘Verlaufe’ :slight_smile:

1 Like
  1. to improve performance turn on CUDA :wink:
  2. @mrthorstenm experimented with synthesis on RPI3 but it is far from realtime, I ran older Taco2 release on Jetson Nano and RT-factor was 1:5 up to 1:10 (1sec audio required 5-10sec processing). In the meantime there is a Tensorflow version of Taco2. Using this and coverting model to Tflite may improve performance on SBCs like RPI and Nano.
  3. this is a known issue, unfortunately this model was trained with wrong phoneme cleaner configuration. A new model is in the works (no publishing date yet)
3 Likes

Here’s the post @dkreutz mentioned.

1 Like

I missed your post Dominik. Your numbers are not promising for using a Jetson Nano for TTS purposes like described here at least not for realtime applications.

Hello.

It’s time for another short update.
We’re currently preparing a new training run and figured out some “issues” with phoneme handling with mixed english/german wording.

This warning occurs quite often:

[WARNING] fount 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "de" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)

We analyzed where these warnings come from and found out that our dataset (metadata.csv) contains several (408) phrases with non-native-german words, being common in german every day language.

Some examples:

  • server
  • opensource
  • song
  • chat
  • team
  • computer
  • party
  • cool

Just a few phoneme samples of default config (keep-flags):

  • Auf der Couch könnte sie es sich gemütlich machen.
    • aʊf dɛɾ (en)kaʊtʃ(de) kœntə ziː ɛs zɪç ɡəmyːtlɪç maxən
  • Wie kann man den Song so verschandeln?
    • viː kan man deːn (en)sɒŋ(de) zoː fɛɾʃandəln
  • Nicht alle Teenager sind so.
    • nɪçt alə (en)tiːneɪdʒə(de) zɪnt zoː
  • Währenddessen spricht sie mit ihrem Computer.
    • vɛːrəndɛsən ʃpɾɪçt ziː mɪt iːrəm (en)kəmpjuːtə(de)

Currently we’re in discussion if we should run training with default option “–language-switch keep-flags” (with these warning to be produced) or if we should run training with disabled phoneme usage in config file.

Wishing you all a nice weekend :slight_smile:

3 Likes

Hi @erogol
It’s a funny coincidence that your latest dev commit (https://github.com/mozilla/TTS/commit/4f3917b9a673a4039e577a8098f545978df5ea2f) matches our current group internal discussion on “keep-flags” :slight_smile:
(See above post)

1 Like