Minimum viable English phoneme coverage set, LJSpeech-style?

I’m looking at Mozilla TTS and taking the following into consideration:




Unfortunately, what seems to be to be a simple question does not seem to have an answer that I can find:

What is the minimum viable set of sentences/utterances in English that provides good phoneme coverage for generating a model with a custom voice?

There are several examples where people tried to do things and didn’t get great results, and no phoneme sets or LJSpeech-type data is available. There is the original LJSpeech dataset which is immense and is not organized in any priority-type order. There are other LJSpeech datasets that are still really quite large.

I understand English is a complex language. You would think, though, that it would be possible to find a minimum-viable dataset that covers all phonemes adequately for making recordings.

Does such a thing exist, and I’m just bad at searching?

1 Like

This here claims that phrases are “phonetically balanced”, the “en-us” section contains 1.132 sentences, not sure if this is sufficient for a good dataset but at least it will be a good starting point:

This is a great start, thanks!