Minimum viable English phoneme coverage set, LJSpeech-style?

thoraxe · October 5, 2021, 8:15pm

I’m looking at Mozilla TTS and taking the following into consideration:

Unfortunately, what seems to be to be a simple question does not seem to have an answer that I can find:

What is the minimum viable set of sentences/utterances in English that provides good phoneme coverage for generating a model with a custom voice?

There are several examples where people tried to do things and didn’t get great results, and no phoneme sets or LJSpeech-type data is available. There is the original LJSpeech dataset which is immense and is not organized in any priority-type order. There are other LJSpeech datasets that are still really quite large.

I understand English is a complex language. You would think, though, that it would be possible to find a minimum-viable dataset that covers all phonemes adequately for making recordings.

Does such a thing exist, and I’m just bad at searching?

dkreutz · October 6, 2021, 12:30pm

This here claims that phrases are “phonetically balanced”, the “en-us” section contains 1.132 sentences, not sure if this is sufficient for a good dataset but at least it will be a good starting point:

thoraxe · October 7, 2021, 7:04pm

This is a great start, thanks!

Topic		Replies	Views
Mozilla Voice [ANSWERED] TTS (Text-to-Speech)	2	697	April 14, 2021
Training with custom Dataset TTS (Text-to-Speech)	1	599	November 4, 2020
Data Requirements for Fine Tuning LJ Speech to learn my voice in English TTS (Text-to-Speech)	1	755	September 1, 2020
Clear process for generating custom voice TTS (Text-to-Speech)	4	4197	October 30, 2020
Is it possible to train a TTS model in a custom language (Latin) with only a couple hours of good quality training data TTS (Text-to-Speech)	4	1455	July 23, 2020

Minimum viable English phoneme coverage set, LJSpeech-style?

Related topics