Looks like the phonemizer initialization is taking most of the time. This script come from the phonemizer package.
This initialization occur for each sentences to be predict and was taking, on my machine, around 0.2 RTF !
I try to initialize it at the loading of the model and succred to go from ~0.35 RTF to ~0.15. It’s more than two times faster with just this trick.
The phonemize processing is not only taking 0.05RTF, whereas tacotron2 is taking ~0.1 RTF. Tacotron2 is then the bottleneck in this case. But if we take speedy_speech, the phonemize processing is one more time the bottleneck.
I will continue to dive in this phonemize stuff, and optimize it.
BTW, no one was having, like me, this heavy initializing time problem for the phonemizer ?