It’s worth reviewing the settings for the power parameter in your config.json file post-training to see if you can improve the quality of your output speech.
Adjusting various parameters including power is covered here so you may well already have looked at it, but it’s potentially worth going a little higher than the 1.5 upper range suggested in the associated comment in the config file. I found I got the best results around 1.8-1.9.
NB: only applies if you’re using Griffin-Lim.
Credit for making me look closer at this goes to the poster of this reply to an issue here (for another similar TTS repo that also uses Griffin Lim)
To demonstrate the impact, there are two samples in the zip file attached, one with 1.4 (more “robotic” / “reverberating”) and one with 1.8 (which to my ear sounds more natural). You’ll likely want to try various levels, YMMV. Beware if you go too far the voice starts to sound quieter and more muffled. Fortunately this can all be experimented with post-training.
Thought I’d share it in case others found it useful.
adjusting_power_parameter.zip (253.0 KB)