Contributing my german voice for tts

Yes, disabled in both configs

maybe best to open a new issue on github then or ask in the issue you linked before.

I listen the SpeedySpeech sample, but it doesn’t even read the whole thing right. And the voice quality is not better than Glow-TTS, at least to my ear.

BTW if anyone is willing to take on SpeedySpeech, we can work on that together. I’d be a nice addition to the repo

1 Like

One important thing to note.

Vocoder models are not fully optimized. I think there is a big perf gap that we can improve with a good hyper parameter search.

Another option is to use model predictions to train the model. In general I observe better results that way .

You can find a list of about twenty audiosamples generated by the SpeedySpeech model and comparisions between SpeedySpeech + MelGAN vs Tacotron2 + MelGAN vs Ground truth here https://janvainer.github.io/speedyspeech/

Speedyspeech is definitely not perfect but it looks like a good compromise between quality and performance, especially when you have CPUs available only. Glow-TTS sounds too unnatural and metallic for my ears.

Finetuning the models may improve both, quality and performance.

Just a short update.

Due the many possible combinations of TTS training and vocoder i’m currently in discussion with @dkreutz and @othiele if we should write a roadmap on which combinations to try with german “thorsten” dataset.

Personally i’m in contact with @synesthesiam who’s kindly training GlowTTS + mb melgan on the dataset (rhasspy and home assistant). He uploaded some samples “training in progress” (link can be found here: https://github.com/thorstenMueller/deep-learning-german-tts/issues/10)

Additionally i try setting up a WaveGrad vocoder training based on this repo https://github.com/ivanvovk/WaveGrad currently.

2 Likes

Good! This fork here already has support for Mozilla TTS spectrogram :slight_smile: https://github.com/freds0/wavegrad you train either with GT spectrograms or preprocessing script and then you save the spectrogram and load it there.

3 Likes

Hello dear TTS-fellowers.

We can celebrate birthday today because the first post in this thread has been written on november 5th, 2019 - so exact one year ago :slight_smile: - and no end in sight.

I just wanna say thank you to all of you guys (list of all named would propably be too long) who inspired, followed, motivated, helped and supported me on this journey.

So a huge round of applause for a nice community.

with all the best wishes for all of you
Thorsten

4 Likes

For those who want something pre-packaged, I have a GlowTTS/Multi-band MelGAN combo trained from @mrthorstenm’s dataset available here: https://github.com/rhasspy/de_larynx-thorsten

You can run a Docker container:

$ docker run -it -p 5002:5002 \
      --device /dev/snd:/dev/snd \
      rhasspy/larynx:de-thorsten-1

and then visit http://localhost:5002 for a test page. There’s a /api/tts endpoint available, and it even mimics the MaryTTS API (/process) so you can use it in any system that supports MaryTTS.

If you happen to use Home Assistant, there’s a Hass.io add-on available as well :slight_smile:

4 Likes

Many thanks for this post and great to see Thorsten’s voice used in different projects though i do not like the results of Glow-TTS in general.

Your trained model has the typical issue with German umlauts like other trained models too, see for instance umlaut issue.

Can I ask if you guys modify the phoneme_cleaner function to not perform transliteration? That is, special characters like ä to a and so on. It is strange you have problems with umlaut characters. I also train on a language with special characters and have no problems with pronunciation :slight_smile: However, I got better results once I implemented a pronunciation dictionary and espeak-ng. Worth to try. :slight_smile:

1 Like

I use a separate tool called gruut that uses a pronunciation dictionary and grapheme-to-phoneme model to generate IPA. It correctly produces /k œ n ə n/ for können, so I’m not sure why the problem persists.

Maybe I’m just hitting a limitation of GlowTTS?

@synesthesiam and i discussed this weird umlaut issue here https://github.com/thorstenMueller/deep-learning-german-tts/issues/10#issuecomment-716823273 .

In our small tts group @repodiac is the guy with most experience in german phoneme cleaning - maybe he can support on this.

1 Like

Turns out my problem was the phoneme cleaners! I just switched it to “no_cleaners”, and it seems to have been fixed (to my untrained ear): https://github.com/rhasspy/de_larynx-thorsten/blob/master/samples/Können_Sie_bitte_langsamer_sprechen.wav

I’ve updated the Docker images :slight_smile:

3 Likes

Hi @synesthesiam.
Thanks for the update. The umlaut in “Können” is now pronounced much better and is good to understand (to my german trained ear :wink: )

1 Like

Inspired by the tts audio sample comparison pages from @edresson1 and @erogol i published some audio samples our group (still some work to do) on a simple webpage.

5 Likes

This is great, thank you! I forgot to add this to my page: my GlowTTS model was trained for 380K steps and the vocoder was trained for 500k steps.

Also, I think sample 5 for the ParallelWaveGAN does not match the text.

Hi @synesthesiam.
Thanks for your nice feedback. I added details on training steps and audiosample #5 to my site.
Additionally i changed text from “Anfang vom Froschkönig” (translated to: beginning of a german fairy story) to the real spoken phrase.

Thanks for your feedback - it makes much more sense now.

1 Like

Just for general information.
User monatis from TensorSpeech / TensorFlowTTS repo is training a model based on my public dataset.

Notebook, details and (work in progress) samples can be found here.
I’ve added some samples on my vocoder comparison page.

3 Likes

Thank you Thorsten.

I gave de_larynx-thorsten a try but was not happy with the final result. At least i would not use it as a German TTS backend for myself. I don’t now if its GlowTTS or Melgan in general. On the other hand, i like that it includes a TTS server by default which turns to be very useful.