Contributing my german voice for tts

Hello dear TTS-fellowers.

We can celebrate birthday today because the first post in this thread has been written on november 5th, 2019 - so exact one year ago :slight_smile: - and no end in sight.

I just wanna say thank you to all of you guys (list of all named would propably be too long) who inspired, followed, motivated, helped and supported me on this journey.

So a huge round of applause for a nice community.

with all the best wishes for all of you
Thorsten

4 Likes

For those who want something pre-packaged, I have a GlowTTS/Multi-band MelGAN combo trained from @mrthorstenm’s dataset available here: https://github.com/rhasspy/de_larynx-thorsten

You can run a Docker container:

$ docker run -it -p 5002:5002 \
      --device /dev/snd:/dev/snd \
      rhasspy/larynx:de-thorsten-1

and then visit http://localhost:5002 for a test page. There’s a /api/tts endpoint available, and it even mimics the MaryTTS API (/process) so you can use it in any system that supports MaryTTS.

If you happen to use Home Assistant, there’s a Hass.io add-on available as well :slight_smile:

4 Likes

Many thanks for this post and great to see Thorsten’s voice used in different projects though i do not like the results of Glow-TTS in general.

Your trained model has the typical issue with German umlauts like other trained models too, see for instance umlaut issue.

Can I ask if you guys modify the phoneme_cleaner function to not perform transliteration? That is, special characters like ä to a and so on. It is strange you have problems with umlaut characters. I also train on a language with special characters and have no problems with pronunciation :slight_smile: However, I got better results once I implemented a pronunciation dictionary and espeak-ng. Worth to try. :slight_smile:

1 Like

I use a separate tool called gruut that uses a pronunciation dictionary and grapheme-to-phoneme model to generate IPA. It correctly produces /k œ n ə n/ for können, so I’m not sure why the problem persists.

Maybe I’m just hitting a limitation of GlowTTS?

@synesthesiam and i discussed this weird umlaut issue here https://github.com/thorstenMueller/deep-learning-german-tts/issues/10#issuecomment-716823273 .

In our small tts group @repodiac is the guy with most experience in german phoneme cleaning - maybe he can support on this.

1 Like

Turns out my problem was the phoneme cleaners! I just switched it to “no_cleaners”, and it seems to have been fixed (to my untrained ear): https://github.com/rhasspy/de_larynx-thorsten/blob/master/samples/Können_Sie_bitte_langsamer_sprechen.wav

I’ve updated the Docker images :slight_smile:

3 Likes

Hi @synesthesiam.
Thanks for the update. The umlaut in “Können” is now pronounced much better and is good to understand (to my german trained ear :wink: )

1 Like

Inspired by the tts audio sample comparison pages from @edresson1 and @erogol i published some audio samples our group (still some work to do) on a simple webpage.

5 Likes

This is great, thank you! I forgot to add this to my page: my GlowTTS model was trained for 380K steps and the vocoder was trained for 500k steps.

Also, I think sample 5 for the ParallelWaveGAN does not match the text.

Hi @synesthesiam.
Thanks for your nice feedback. I added details on training steps and audiosample #5 to my site.
Additionally i changed text from “Anfang vom Froschkönig” (translated to: beginning of a german fairy story) to the real spoken phrase.

Thanks for your feedback - it makes much more sense now.

1 Like

Just for general information.
User monatis from TensorSpeech / TensorFlowTTS repo is training a model based on my public dataset.

Notebook, details and (work in progress) samples can be found here.
I’ve added some samples on my vocoder comparison page.

3 Likes

Thank you Thorsten.

I gave de_larynx-thorsten a try but was not happy with the final result. At least i would not use it as a German TTS backend for myself. I don’t now if its GlowTTS or Melgan in general. On the other hand, i like that it includes a TTS server by default which turns to be very useful.

We’re aware that there is lot of room for quality improvements, but we’re still in training-mode.
@dkreutz is training a VocGAN model and i’m training (due great support from @sanjaesc ) a WaveGrad model. We’ll see if quality can improve against previous models.

1 Like

Hey guys.

Thanks to @nmstoker i tried dev tensorboard and share current wavegrad training with “thorsten” dataset.

Audio has still random noise in background, but i hope it will get less on more training or is this a good point to use tune_wavegrad.py?

Soundcloud audio sample

1 Like

try 50 iterations. It should be fine by now. In training it only uses 12 iterations I guess for faster runtime.

Great hint. Current published training models are about 97k steps for Tacotron2 and 800K steps for MB-Melgan. As you said there is room for improvements.

p.s. convergence is expected around 100K and 950K steps respectively.

After a short discussion in Mycroft chat i uploaded some new samples on current wavegrad training.

Some facts:

  • existing taco2 model of my dataset (460k) trained by @othiele
  • wavegrad model training currently running (right now at 350k steps)
  • tune_wavegrad still pending for getting noise schedule

I’ll keep wavegrad training running up to 500k and pause then for running tune_wavegrad.

1 Like

Nice job, eager to see the final results. I am sure you will add same sentence examples to your overview page once finished for direct comparison https://thorstenmueller.github.io/deep-learning-german-tts/audio_compare

Thanks @TheDayAfter.
Currently i’m playing around with WaveGrad and noise_schedule. Based on these vocder testresults i’ll continue wavegrad or taco2 model training.

If there’s something new to show, i’ll publish it on my comparison page.

1 Like