Thanks to @sanjaesc i was able to run tune_wavegrad for noise scheduling. It’s still work to do, but it’s getting slightly better.
I’ve uploaded some samples on my comparison page:
And a short one here:
Generation time (tested on cpu) isn’t really fast.
Run-time: 81.68155550956726
Real-time factor: 9.704083679886214
Time per step: 0.00044009456989066355
What do you think should be next:
a) More training on taco2 model
b) More training on wavegrad vocoder