I believe we’ve done almost everything practically possible on Tacotron. Mozilla TTS has the most robust public Tacotron implementation so far. However, it is still slightly slow for low-end devices.
It is time for us to go for a new model. I just want to ask your opinion about what model we should use for this next iteration. You can also share some papers if you like.