I’m looking at Mozilla TTS and taking the following into consideration:
Unfortunately, what seems to be to be a simple question does not seem to have an answer that I can find:
What is the minimum viable set of sentences/utterances in English that provides good phoneme coverage for generating a model with a custom voice?
There are several examples where people tried to do things and didn’t get great results, and no phoneme sets or LJSpeech-type data is available. There is the original LJSpeech dataset which is immense and is not organized in any priority-type order. There are other LJSpeech datasets that are still really quite large.
I understand English is a complex language. You would think, though, that it would be possible to find a minimum-viable dataset that covers all phonemes adequately for making recordings.
Does such a thing exist, and I’m just bad at searching?