I am new to AI and learning stuff online and trying to fiddle around with whats available in GitHub and Colab. Also doing the fast.ai free online course to get a better understanding.
The above preamble was meant to explain why my question below may seem noobish.
I am interested in eventually using a pre-trained model and refining it to learn a different voice which would be used for TTS. Its a purely personal project so I don’t have any quality standards to adhere to.
My question is how does one go about achieving both of the following through model refinement (assuming i’m using a pre-trained LJSpeech model, and have adequate speech samples of the other voices)
- Changing the voice to, using someone famous as an example, Dennis Quaid
- Changing the delivery to, again using someone famous as an example, Morgan Freeman.
Therefore, the output would be a refined voice TTS voice that sounds like Dennis but speaks with the cadence and delivery of Morgan.
Is it actually possible to achieve that through transfer learning?
Or, is this only possible to do by training a model from scratch on Morgan and then transfer learn to Dennis’ voice?