In an ideal world, one speaker - one sentence, infinite sentences, and speakers… But the main problem is the scarcity of the text-corpora, because of the CC0 requirement. In my opinion, this is where community work is most necessary.
Otherwise, AFAIK, a sentence being recorded by 2 or 3 different people (accent, gender, age, …) may not be a bad thing, provided that the sentence corpus coverage of the vocabulary (thus phonemes) is not low (not 5000 sentences recorded by 15 people each - which we can encounter in datasets).
Here is an example calculation for the requirements. In my language, I aim to not exceed 2 recs/sentence - for now.
There are many ways for text-corpus generation, but all require a lot of work. The sentences MUST be correct. One problem one needs to solve is the “domain” problem. Every one of us can create sentences from our daily life, which would result in a base, which for example can understand “aspirin”, but not “acetylsalicylic acid” (which is aspirin). IMHO, this is not hard jargon that should be left out. Every knowledge we gain in pre-university majors should be understood by a system we build. For further levels, fine-tuning with specially prepared domain-specific corpus would be needed.
So, if the text corpus is low and recs/sentence is high (say >5), I concur that promotion to record will be pointless, even harmful. But text-corpus generation could be promoted.
This is usually done through campaigns prepared by language community leads, but only a small percentage of languages have such active communities.
Another problem here is: These communities are on their own. They do not have enough time/resources for wider reach, except family, friends, and colleagues - mainly highly educated large-city people. These are very good for creating the text-corpus and for validation works (after some recording), but we need voice diversity from rural areas (local tongues), and people living in other countries (regional dialects).
Such a global campaign will help these goals tremendously. But not “to just record”, to come to the project and join/form communities, so more knowledgable people (read “those who know the dataset coverage and issues in it”) can direct the newcomers to the correct channel.
In short, I’d say: Do it for every language, but in a more civil-societal / community-based manner.