Is there a way to have the TTS model generate spoken phonemes in addition to spoken words? For example, to have the model read out the sentence “hɛˈləʊ wɜːld hello world” and have it say “hello world” twice in a row.
If you’re happy to write some code this shouldn’t be particularly hard although I suspect it’ll be marginally easier if you stick with sentences being purely regular alphabet or IPA characters within a particular sentence.
Take a look at the code that converts the text to phonemes. You should check the sentence for the presence of IPA characters and if any are found then you bypass the step that turns the text into phonemes and pass the IPA characters directly to the TTS model.
BTW your subject description is confusing as it implies you’re trying to do something else (to be consistent I’d suggest that “Generate audio from phonemes” made more sense)