Hello. I’m also very interested in any strategy that makes it more possible to run a decent TTS system on limited hardware (for intance, ARM devices).
MS’s FastSpeech paper seems to be very decent.
I’ve listened to the samples and it’s possible to argue that Mozilla’s demo sound better. However, it takes too long to generate sound and I’d rather have a not too great voice than no voice at all. If I try the usual local open source systems (espeak, maryTTS, flite) I see that they sound horrible. FastSpeech is much better than that.
@geneing 1) The WaveRNN vocoder version sound seems to be much better than the Griffin-Lim vocoder version. How long are the generating times compared? I want to use it with a low power with only the CPU. Is is possible?
- Is it possible to train another language using your implementation? How could I do it, is there a step by step instruction on how to do it somewhere?
Thanks for your great work!
@erogol There must be some way of chosing the ForwardTacotron (or other FastSpeech inpired implementations) when using Mozilla’s TTS. This could pave the way to use it on low power devices and voice assistants and pave the way for the “Open Web” and to more privacy. Are there other people at Mozilla that are concerned about the high requirements of the current Mozilla TTS implementations?