Tags for voice (accent)

Hi! I have a question. When I add new tags for my voice (accent), are they use only for new audio clips or for old clips also? I hope, that first is correct!

I’m speaking about this|310x500

This is a very good question, not only for accent, but for all metadata (age and gender).
As far as I can see, they are only valid for new recordings (those after you make the change) - if I’m mistaken please somebody correct me.

That would be logical for many cases:

  • If somebody is 59 and has “fifties” set in metadata, and change it to “sixties” the following year, only new data should be changed.
  • If somebody switches sex, old data should be kept as it is.
  • If somebody can speak different accents, that info should come from the current setting - hoping he/she is aware of this fact and keep track of it.

The only problem I see in this process is with logged-in/logged-out recordings, as indicated in the following issue: https://github.com/common-voice/CorporaCreator/issues/117

2 Likes

Thank you!

  • If somebody is 59 and I had “fifties” set in metadata, after sometime, and change it to “sixties” next year, only new data should be changed.
  • If somebody switches sex, old data should be kept as it is.
  • If somebody can speak different accents, that info should come from the current setting - hoping he/she is aware of this fact and keep track of it.

I agree. I asked about that cause my nose is blocked and my voice is changed now, therefore I added this remark in my tags. After the nose will be unblocked I want to remove the tag.

@Flay, this might not be the solution, I didn’t have the time to check the database & code…

On a side note, I don’t think you need to change your accent information if your voice changes. Accent is something different.

If you are recording many sentences, it is a good thing that your voice changes. I change recording devices (desktop, mobile, laptop), environments (studio, home, balcony etc) and my distance to the device, thus my voice level in every possibility. That would prevent voice bias.

Even if you are targeting for an application which is biased for your own voice (e.g. a home automation system or other security related stuff), it is a good thing, because when you have cold, you would not be able to command the system.

1 Like

Having people select their age range is an oversight I guess. Select the decade they were born in instead and use the recording date to work out the age range. If you have a bunch of samples that are in one bracket then move to the next bracket on your birthday then you’ve leaked your date of birth.

Accent or gender tags should be applied to all samples that didn’t have one before IMO. Is it possible to get all samples for a specific speaker out of the data set and sort them by recording date? If so people can apply changes retrospectively, and also make their own mind up about the usefulness of the tags of speakers who switch them too much.

I asked about that last week in chat. It might be that the numbering in the path field is strictly increasing (it seemed like unique id number to me). If so, one can sort the DataFrame in ascending order in that field and get them in time order. But I’m not sure, and it is not verified… I might download my own recordings and check, didn’t have the time yet…

I constantly warn the language community to keep their demographics info right and make sure they are logged in, if they have chosen to give their demographics info.

I have no idea how much one’s voice changes with years, and what effects that would have on the training of a voice-AI. Sure child-teen-adult-elderly voices of yourself are different, but time is continuous. There will be no difference between 39 and 40 unless something happened with your voice cords in the meantime, which could happen at any age.

Perhaps @ftyers can give us some clues on that…

Yeah I mean from a privacy perspective. If you have an idea when a sample is made and which user it belong to, and on my 40th birthday I move from the 30-39 to the 40-49 age bracket then you know my date of birth. So by having users set their own age range rather than decade of birth, you’re encouraging them to leak their DoB.

That seems more dangerous than just deanonymizing them. Like from an actual practical real-world perspective - you have their voice print and date of birth, and some banks are using voice ID for security. Get a name by scraping the forums or spearphishing, some more info by regular reconnaissance, you could empty their bank account. It’d apply to what… 1/20 users who used the site for 6 months and also kept their metadata current? Not a huge population area but definitely exploitable with enough users and smart ways to filter the data.

1 Like

Oh, I totally misread what you wrote… Sorry :frowning:

You can only see the change once in 3 months (releases), so you cannot fix the exact date…

1 Like

If I have a point of reference I can. Like it I have stats that day there were 1000 recordings a day on average and I’ve got sequence numbers for this data set and an old one, I can likely guess the date of a file from that. Let’s all just be glad that software pays enough money to keep me honest :joy:

But regardless, having something the user needs to update isn’t a great idea, it’s adding unnecessary work and error Their decade of birth doesn’t change, but their age range does. Locally, I’m pretty sure that generation and socioeconomic status have the strongest strong effect on accent. If you listen to how The Beatles spoke in the 1960s compared to The La’s in 2000s there’s a clear difference in how the Scouse accent has changed. My grandma’s accent was different to my mum’s and my friends’, and they do drift over time. So I think birth decade plus recording year plus accent tags are more useful than age range as it adds valuable context.

The tags should be as static as possible if they’re to be useful IMO - it’s their statistical value that matters and outliers aren’t really useful in that context. Someone changing accent or gender is enough of a corner case that “unknown, specific, multiple” is more useful than random recordings made at an unknown date by someone who admits that they’re inconsistent. Applying the tags to existing data gives us more data that’s correct on average. Not adding it to appease the personal identities of statistical outliers seems like folly to me. Their identity is unimportant, they’re supposed to be anonymous.

Yes, in my 35+ years of programming experience I never wrote a form asking age, if needed, I asked DoB or year.

One more thing: The data is user given. It can be fake. For example, if you want to have equal number of genders in a training set, you need to moderate the data.