Hi guys!
For a university course project, a few of us explored different TTS techniques for generating emotional speech (both HMM based and Deep Learning based). All our experiments are here - https://github.com/Emotional-Text-to-Speech
There is a gap in the literature while trying to fine-tuning pre-trained TTS models (trained on large datasets like LJ Speech) on low resource (emotional) speech data. We tried a lot of approaches – most of them didn’t work out, and we thought that the TTS community could benefit from our findings and build up over these experiments – they are documented over here - https://github.com/Emotional-Text-to-Speech/dl-for-emo-tts
We’ve also released the models for all the approaches we tried (even if they didn’t work) along with their corresponding code for reproducibility purposes, along with some demos that can be played with!
Suggestions and comments are most welcome