Contributing my german voice for tts

mrthorstenm · August 15, 2020, 8:08pm

I’m currently preparing this.

mrthorstenm · August 15, 2020, 8:11pm

@erogol that’s the config we used for last training after experimenting with lots of config parameters. In this one we’d just the issue on forget to set german phoneme cleaners by @repodiac

config.zip (3,9 KB)

dkreutz · August 15, 2020, 10:02pm

Just in case that it is of help - I can provide the dataset normalized with pyloudnorm to -24dB.

erogol · August 17, 2020, 10:54am

do_sound_norm option corrected the problem a bit. I finished the Tacotron2 training, and now started PWGAN training as a vocoder.

erogol · August 17, 2020, 10:56am

I did the same mistake. Now the models does not work for numbers written in digits. This can be solved in runtime by using a german number normalization by an external library.

repodiac · August 17, 2020, 11:02am

Please have a look at the german_transliterate library: https://github.com/repodiac/german_transliterate

sed '/import re/a from german_transliterate.core import GermanTransliterate' cleaners.py >> cleaners-new.py
mv cleaners-new.py cleaners.py
echo "\ndef german_phoneme_cleaners(text):" >> cleaners.py
echo "\treturn GermanTransliterate(replace={';': ',', ':': ' '}, sep_abbreviation=' -- ').transliterate(text)" >> cleaners.py

could help then

… and everything else necessary is wrapped up as Docker container here: https://github.com/repodiac/tit-for-tat

erogol · August 17, 2020, 11:02am

thx I’ll take a look after I train the vocoder.

erogol · August 18, 2020, 2:09pm

Here is v0.1 results

erogol · August 18, 2020, 3:03pm

I’ve added the recipe https://github.com/erogol/TTS_recipes

repodiac · August 18, 2020, 3:15pm

just my 2 cents… as a state-of-the-art way of dealing with these rather complicated setups… I would welcome and recommend using Docker containers instead. Such as training and inference becomes a “turn-key solution” without much or even any manual fiddling.

As an example or blueprint I have put @mrthorstenm setup into a Dockerfile as he mentioned above: https://github.com/repodiac/tit-for-tat/blob/master/thorsten-TTS/Dockerfile

Alexander_Liu · August 20, 2020, 2:53am

This is great. Pleased to see donators. I wish I could donate my systematic voice one day!

erogol · August 21, 2020, 10:00am

Created a notebook https://github.com/mozilla/TTS/wiki/TTS-Notebooks-and-Tutorials

mrthorstenm · August 22, 2020, 11:21am

Dataset “thorsten” is now available in version 02.

Improvements over version 01:

normalized to -24dB
split metadata.csv into shuffeled metadata_train.csv and metadata_val.csv

For compatibility reasons i’ll keep version 1 available.

See my github page for dataset details and download url.

repodiac · August 24, 2020, 2:52pm

Hi guys, just to let you know if you are interested. Feel free to check out my recent upload of https://github.com/repodiac/espeak-ng_german_loan_words - it is a brief tutorial with code where you can automatically create an additional dictionary for espeak-ng with ~10k German loan words.

These may improve TTS preprocessing when using phonemes (because loan words are correctly pronounced then, not “german-ized”)

TheDayAfter · August 28, 2020, 2:52pm

@erogol @mrthorstenm Many thanks for computing respective donating. I have several questions:

Below you see the performance on a PC with quadcore i5 CPUs turning CUDA off, two out of four cores seem to be used during computation

sentence = “In Deutschland starben bislang zwar weniger Menschen an Covid-19 als etwa in Belgien oder Großbritannien. Eine neue Studie zeigt jedoch: Bei Patienten, die ins Krankenhaus mussten, sind die Verläufe überall ähnlich.”
align, spec, stop_tokens, wav = tts(model, sentence, TTS_CONFIG, use_cuda, ap, use_gl=False, figures=True)
```
 (333312,)
  > Run-time: 36.56012582778931
  > Real-time factor: 2.4186008842661977
  > Time per step: 0.00010968712780813468
```

How can i improve this?

Can this kind of stuff run in realtime on a Jetson Nano?
Remark: The word ‘mussten’ was pronounced as ‘müssten’ and ‘Verläufe’ as ‘Verlaufe’

dkreutz · August 28, 2020, 3:21pm

to improve performance turn on CUDA
@mrthorstenm experimented with synthesis on RPI3 but it is far from realtime, I ran older Taco2 release on Jetson Nano and RT-factor was 1:5 up to 1:10 (1sec audio required 5-10sec processing). In the meantime there is a Tensorflow version of Taco2. Using this and coverting model to Tflite may improve performance on SBCs like RPI and Nano.
this is a known issue, unfortunately this model was trained with wrong phoneme cleaner configuration. A new model is in the works (no publishing date yet)

mrthorstenm · August 28, 2020, 3:26pm

Here’s the post @dkreutz mentioned.

TheDayAfter · August 28, 2020, 4:26pm

I missed your post Dominik. Your numbers are not promising for using a Jetson Nano for TTS purposes like described here at least not for realtime applications.

mrthorstenm · September 5, 2020, 9:05am

Hello.

It’s time for another short update.
We’re currently preparing a new training run and figured out some “issues” with phoneme handling with mixed english/german wording.

This warning occurs quite often:

[WARNING] fount 1 utterances containing language switches on lines 1
[WARNING] extra phones may appear in the "de" phoneset
[WARNING] language switch flags have been kept (applying "keep-flags" policy)

We analyzed where these warnings come from and found out that our dataset (metadata.csv) contains several (408) phrases with non-native-german words, being common in german every day language.

Some examples:

server
opensource
song
chat
team
computer
party
cool
…

Just a few phoneme samples of default config (keep-flags):

Auf der Couch könnte sie es sich gemütlich machen.
- aʊf dɛɾ (en)kaʊtʃ(de) kœntə ziː ɛs zɪç ɡəmyːtlɪç maxən
Wie kann man den Song so verschandeln?
- viː kan man deːn (en)sɒŋ(de) zoː fɛɾʃandəln
Nicht alle Teenager sind so.
- nɪçt alə (en)tiːneɪdʒə(de) zɪnt zoː
Währenddessen spricht sie mit ihrem Computer.
- vɛːrəndɛsən ʃpɾɪçt ziː mɪt iːrəm (en)kəmpjuːtə(de)

Currently we’re in discussion if we should run training with default option “–language-switch keep-flags” (with these warning to be produced) or if we should run training with disabled phoneme usage in config file.

Wishing you all a nice weekend

mrthorstenm · September 7, 2020, 11:12am

Hi @erogol
It’s a funny coincidence that your latest dev commit (https://github.com/mozilla/TTS/commit/4f3917b9a673a4039e577a8098f545978df5ea2f) matches our current group internal discussion on “keep-flags”
(See above post)