What’s the plan for the GST branch? Are you planning to implement Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis paper?
I plan to learn speaker style with GST and direct it for inference. However, so far what GST learns is quite random hence hard to manipulate.