I listen the SpeedySpeech sample, but it doesn’t even read the whole thing right. And the voice quality is not better than Glow-TTS, at least to my ear.
BTW if anyone is willing to take on SpeedySpeech, we can work on that together. I’d be a nice addition to the repo
You can find a list of about twenty audiosamples generated by the SpeedySpeech model and comparisions between SpeedySpeech + MelGAN vs Tacotron2 + MelGAN vs Ground truth here https://janvainer.github.io/speedyspeech/
Speedyspeech is definitely not perfect but it looks like a good compromise between quality and performance, especially when you have CPUs available only. Glow-TTS sounds too unnatural and metallic for my ears.
Finetuning the models may improve both, quality and performance.
Due the many possible combinations of TTS training and vocoder i’m currently in discussion with @dkreutz and @othiele if we should write a roadmap on which combinations to try with german “thorsten” dataset.
Good! This fork here already has support for Mozilla TTS spectrogram https://github.com/freds0/wavegrad you train either with GT spectrograms or preprocessing script and then you save the spectrogram and load it there.
We can celebrate birthday today because the first post in this thread has been written on november 5th, 2019 - so exact one year ago - and no end in sight.
I just wanna say thank you to all of you guys (list of all named would propably be too long) who inspired, followed, motivated, helped and supported me on this journey.
and then visit http://localhost:5002 for a test page. There’s a /api/tts endpoint available, and it even mimics the MaryTTS API (/process) so you can use it in any system that supports MaryTTS.
If you happen to use Home Assistant, there’s a Hass.io add-on available as well
Can I ask if you guys modify the phoneme_cleaner function to not perform transliteration? That is, special characters like ä to a and so on. It is strange you have problems with umlaut characters. I also train on a language with special characters and have no problems with pronunciation However, I got better results once I implemented a pronunciation dictionary and espeak-ng. Worth to try.
I use a separate tool called gruut that uses a pronunciation dictionary and grapheme-to-phoneme model to generate IPA. It correctly produces /k œ n ə n/ for können, so I’m not sure why the problem persists.