Cheers Neil, thank you so much You are always so kind and helpful. I spent the entire evening trying to get it to work but it did not work, but I just got it to start on my Mac and I am actually very happy, because it works; my problem is that Swedish has a lot of compound words and I guess espeak hasn’t had a Swede work on the rules, so compounds are always mispronounced. Then I saw you guys trying about dictionaries (which was an idea I initially was going to do away with, because I worked on them so much when I did concat TTS and I am sick of them ) but now it looks like it is the only solution, because I really do not want to train a char-based TTS - I suspect it will only perform poorly and I really want the phonemes for out-of-domain words. I tried adding one now and it worked. I am also lucky because a few years ago the National Library of Oslo got a pronunciation dictionary and it is open source, so I get to use it Now I will get my hands dirty and try to fix espeak-ng on Ubuntu too, to train using it.
Hello again @mrthorstenm! Is there any way I could email you about some questions? I’m working on my master thesis about applying prosody controls to Tacotron-2 and I’d love to get in touch and discuss potentially using your dataset for experimentation. Thank you and all the best!
Lately I have been working on improving the TTS performance in compound and unseen words, since it is a hit or miss, especially since you cannot dictate the stress and it is entirely up to Tacotron what it will learn as a linguistic feature. One of the problems I had was that, in short sentences with unseen words (1 or 2 words), the stopnet sometimes tripped. That was also the case with compound words. I found that incorporating a pronunciation lexicon improves pronunciation massively and helps with the stopnet. My guess is that a large pronunciation lexicon that covers a big portion of the words is consistent in the phonemic transcriptions, so when the TTS is trained on the phoneme sequences, it may be much easier for it, when in guessing it might guess different phonemes for compound words (because it has not seen them) and trip.
Do you have some example of a tts snippet from your voice? Would be nice to know how Mozilla TTS works for german language. - I am new to Mozilla TTS and currently exploring the status quo.
Best Regards from Frankfurt
Hello @fabianbusch.
Currently there’s no model (or samples) available since our tacotron2 training is still running and we’re finetuning several parameters to figure out best configuration.
Dataset “thorsten” is now available for free to use german tts training.
See my github page for dataset details and download url.
Please read „special thanks“ section on github for a list of great supporters on this project. It’s a pleasure to work with you guys on a free german tts model.
This speech corpus seems very, very good; great job! Only wish there were corpora like this one for all other Germanic languages
I posted an article on why i’ve chosen to contribute my voice.
@mrthorstenm I started DDC training with your dataset. So far samples look quite good. I’ll share the model once it is finished. And probably I’ll share a recipe too.
Hello.
German umlauts and phoneme cleaner issues:
@erogol we had sometimes problems with german umlauts in model training before using german phoneme cleaner by @repodiac. Maybe it’s woth a look if you encounter umlaut problems too.
TTS recipe
I wrote a shell script for starting training including pretasks and @repodiac wrapped it into a docker image.
The training model produces very silent results. Have you experienced it before with your models?
Now I start to train it with “do_sound_norm: true” to normalize the sound level. Hopefully it’ll mitigate the problem.
Nice you have the shell script. Please send a PR and we take look at it together.
We know that some recordings are louder (not much) than others but we thought that this would be normalized during training - so setting this option to true seems to make sense.
@dkreutz sent me a sample in the past which starts with “normal” volume and in second part decreases volume to a lower level.
derkleineprinz.zip (492,4 KB)
Maybe @dkreutz can help on this volume issue.
I’m currently preparing this.
@erogol that’s the config we used for last training after experimenting with lots of config parameters. In this one we’d just the issue on forget to set german phoneme cleaners by @repodiac
config.zip (3,9 KB)
do_sound_norm option corrected the problem a bit. I finished the Tacotron2 training, and now started PWGAN training as a vocoder.
I did the same mistake. Now the models does not work for numbers written in digits. This can be solved in runtime by using a german number normalization by an external library.
Please have a look at the german_transliterate
library: https://github.com/repodiac/german_transliterate
sed '/import re/a from german_transliterate.core import GermanTransliterate' cleaners.py >> cleaners-new.py
mv cleaners-new.py cleaners.py
echo "\ndef german_phoneme_cleaners(text):" >> cleaners.py
echo "\treturn GermanTransliterate(replace={';': ',', ':': ' '}, sep_abbreviation=' -- ').transliterate(text)" >> cleaners.py
could help then
… and everything else necessary is wrapped up as Docker container here: https://github.com/repodiac/tit-for-tat