What are the TTS models you know to be faster than Tacotron?

Kirian · February 2, 2021, 10:47am

I checked to make sure :

I’m using one gpu (RTX 2080). The model take around 1GB of ram in the gpu, and the inference use around 60W of power usage.
If we cut the process in TTS / Vocoder, the tts take 98% of the time (aroud 0.3 RTF) and the vocoder only 2% of the time (0.006 RTF). MB melgan is really fast !

If we look closer in the TTS, tacotron2 (or speedy speech or glow TTS) take around 30% of the time, and the phonemize() method take 70% of the time. I thought it’s weird so I am making more profiling :
The espeak processing is quiet slow, and I don’t get why exactly atm. I used the python script given on this thread to profile the time spent by espeak only on sentences. The result is that it take between 1 and 2 millisecond (for resp. sentences of 20 characters to 600 characters long). So espeak itself should only be responsible of 0.001 to 0.005 RTF on my machine.

I will notice you when I will find the reason of this slow processing in the espeak processing !

Topic		Replies	Views
ForwardTacotron experience TTS (Text-to-Speech)	12	2516	May 16, 2020
How to start with TTS + WaveRNN? TTS (Text-to-Speech)	10	2572	May 22, 2020
Final results LPCNet + Tacotron2 (Spanish) TTS (Text-to-Speech)	73	11588	January 11, 2021
C++ Implementation of synthesizer for the Tacotron model based on OpenCV capable of running on mobile devices TTS (Text-to-Speech)	12	2906	September 23, 2020
High Quality TTS \| Synthesis Time Is Not a Constraint \| Pipeline TTS (Text-to-Speech)	5	614	October 5, 2020

What are the TTS models you know to be faster than Tacotron?

Related topics