How will the lack of female voices be handled?

To start, I'm not here to discuss politics, I just want genuine answers to this question.

Looking around, I noticed that it says 41% of voice recordings are male, and only 10% female. My question is how will this be handled in the future? The programs learning with this database would need voices of many pitches, and right now this database doesn’t really offer that… People training AIs off of this collection will probably want an even number of male to female voices, or at least a better ratio than 1 out of 5.

I’m just curious if anything will be done when Common Voice is more officially put out there? Maybe separate packages to sort different pitches of voice? That would allow people to use an even number of different voice types.

And I mean… this is called Common Voice, shouldn’t it be the common voice? I think a database with many languages and accents is valuable, but is it practical if it doesn’t include many references to literally half of the world’s voices?

Hi @CatLadyish – I see this is your first post, so let me say thank you and welcome to the community!

Let me handle the first, technical side of this question: Meta-data associated with each clip are part of our dataset releases, including people’s self-identified gender. We don’t ask people to self-identify their voice pitch and aren’t planning to perform any analysis to label different clips with information about the pitch. In addition, the site allows people to contribute without entering meta-data – we thought this was important to ensure greater inclusiveness of the project for people who didn’t want to provide this information.

Your broader point is critical and very much the core intention of Common Voice: That this project attracts a large diversity of voices, which includes gender diversity. We haven’t yet looked at the various dimensions of diversity and run campaigns to encourage specific groups under-represented in the dataset to contribute. The hope is to do that in the future.

1 question for you, and 1 thought:

  1. Do you have ideas on how we could make the site and project more amenable to gender diversity? Or more specifically, do you think there’s anything the site/project are doing right now that is dissuading women from contributing, in particular?

  2. I know that different language communities have rallied to see more of their language represented. I wonder what it would look like for other types of communities to organize around Common Voice to encourage participation – around gender, age, accent, some other dimensions?

Thanks again!

Common Voice definitely needs more female voice samples, but once SpecAugment lands it should improve due to the random pitch augmentations.

@CatLadyish, listening to the recordings, my guess is that the proportion of female voices is even lower than you’ve suggested. A very large proportion of readers don’t bother to specify gender at all, and I think that those ‘unknown’ readers are even more heavily weighted towards male voices than those who do specify gender. It is a real problem.

I think this issue is important, and I’m putting a note here to make sure in the upcoming campaign we put some emphasis on the diversity of the voices (both gender and age).

@konstantina Let’s me sure we have this covered somehow when engaging with call to actions, specifically calling for more female voices and more diversity on ages.

@lsaunders Do we have stats about the current diversity of voices? Maybe this would be interesting to expose on the UI to encourage people? (@mbranson)

@nukeador I don’t currently have those stats but will work with @gweber on how to pull that data of the next couple weeks.


A great voice to use would be Edith Skinner’s. She was the last great teacher of the North American Theater Standard. Intelligibility is there in spades. There should be resources of her reading scripts.

One of the problems I foresee with a limited female set is a higher proportion of errors. The Venn diagram of public speakers and tech people is not too good and I imagine the smaller net one casts, the worse it could be. If you have fewer people from fewer walks of life:

You might not get a female biologist who can say “antennae” correctly (it ends with the vowel of “Nike” or “banshee,” not the “ay” of “parlay”).

You might not get a female speaker who knows that cadre, fin de siecle, laissez-faire, rendez-vous, savoir faire, je ne sais pas, and forte are from French and not Spanish. (KAHD ra, FAN da see EK la, less SAY FAIR, rahn day VOO, sah VWAR FAIR, zhan SAY pah, fort).

You might not get a female speaker who knows that “alpha” is “alfa” (not “awl fa”)…

You might not get a female speaker to say “Worcestershire” correctly.

You might not get a female speaker to say “Saratoga” or “Giacomo” correctly.

And so on…with rejection from people testing the listening, this will make the number of accepted female voices even smaller.

Vice versa, the people who are public speakers (maybe they work in radio or politics) are not necessarily recording here. There is not much publicity for this project. I would suggest putting notices on the Mozilla Firefox start screen.

Additionally, females are more likely to start using new speaking styles.Yes, I realize that “cupboard” became “cubbard” and so on. It’s a fait accompli. There are certain innovations that many if not most speakers may reject.

For example, a lot of American women and Californians (male or female) do not only saw “oo” as “eeww” and “apple” as “ahpple” and “palm” as “pom/pawm” … some also merge the vowel in “nurse” in “love” and the “a” in “comma.”

You are less likely to get any females without the cot-caught merger. Any NY woman will be quite misunderstood saying “walk” by the machine as a result. You are less likely to get women who say “apple” as “EE-apple” (as in the mid-Northern states).

There are audio CDs for the (since edited) book by Skinner, Speak with Distinction. Lots and lots of repeating sounds: “He bangs his fists against the posts and still insists he sees the ghosts” “Boyish Roister Doister.”

To get West Coast accents, especially female ones, a lot of EFL material will have distinct California, Pac Northwest, or British Columbia accents.

Even though it is “outdated,” old theater and TV recordings from the 1950s and before will have roughly American Northeast accents. A lot of Hollywood accents are California influenced (it takes me out of the movie to hear a Jersey Senator speak that way, but that’s another topic altogether).

Would radio plays be a corpus source? That is likely to have female voices. Luckily for the purposes of building a corpus, American theater does not have the tradition of men playing women’s parts to the same degree as in China, for example.

I also suggest these because the copyright has probably lapsed on many old things or the owners never explicitly asserted copyright in that window of time that, for example, made The Night of the Living Dead become public domain.

@nukeador I’d definitely be curious how we can leverage a campaign to bring extra messaging to the site rather than integrating this messaging into the contribution UI itself. The overall site messaging is purposefully open and focused on the humanity of speech as it is subjectively unique to each individual. Our project goal is not to promote one voice over an other and the campaigns would be a natural way to enhance that message and communicate dataset diversity needs in a more focused way. Let’s work together on this and get some integrations planned!

Two notes:

  • We don’t just need more female voices for example, but a big diversity of them (not just a few)
  • If there are scripts and voices with public domain we could see if we can use them, but the will require someone to set them up in sync and split by small sentences. Do you know if this content have expired copyright or uses public domain license?