I am new to the world of deep learning and all that stuff so forgive me for not knowing anything about it. But I am happy to learn.
So I have seen the model Tacotron2-iter-260K with a soundcloud link that sounds awesome. However having successfully deployed it after a lot of trouble shooting ended up being not as fulfilling as I expected it to be. It sounded much worse. Now after digging deeper I have noticed that it probably was used in combination with WaveRNN. But how do I continue from here? I setup WaveRNN with the pre trained modules from here:
How do I use it in combination with the tacotron2 model? How do I get the expected results as in the soundcloud links for the tacotron2 model?
And is there anyway to speed up the process of WaveRNN? I have a 2080 Super and a 3900x but both are chilling at 10% usage and WaveRNN takes like 2 minutes for one short sentence.
I am intending to use this for a real time text to speech application so I would be happy if there was a way to achieve speeds around like 5 seconds max. Thanks for any help in advance.