Thoughts on accents

david-song · October 14, 2021, 3:57am

Accent doesn’t just apply regionally, it also depends on sex, age, social class, ethnicity, native tongue and even sexuality; things that are quite personal to the speaker.

Rather than cataloguing every accent and either putting people into buckets or forcing them to self-identity, I think it’d make more sense to detect them automatically and not care about human classification at all.

Maybe use a single sentence in each language that every contributor reads from time to time, and compute accent markers from that data. The discovered clusters and distance from them become accent detection / calibration data, and end-users read that same phrase to get their own accent calibration.

Is this how it actually works, and labels will just be used just to detect skew in the data sets?

cjbaker · October 14, 2021, 11:07pm

It’s an interesting idea, but it’s very hard to develop something which automatically detects accents without any labeled data. You talk about clustering by distance, but I have never seen such a distance metric which actually produces natural clustering by accent. There is just too much other noise: vocal tract length, microphone, speaking rate, background sounds… Even if you could get such natural clusters, you wouldn’t be able to put a name to them or compare them with any ground truth without some human labeling or another outside source.

As you say, there are some difficulties with putting people into buckets or forcing them to self-identify, but it still provides a place to start from even if it isn’t perfect. Can I ask why you think it would make more sense not to care about human classification? I can certainly see why people would be concerned about privacy, for example.

When you ask “is this how it actually works”, realize that this dataset can be used for many different purposes by all different developers and groups. One is training models for speech recognition, in which case you might for example train a model for just one accent, or one for each accent, or provide the accent label as input to the model. Another could be training accent identification models. Other projects present the CommonVoice data to human learners of a foreign language, as a model. Within the CommonVoice project itself, I’ve proposed using the accent labels to present reviewers with only accents they’re familiar with.

david-song · October 15, 2021, 1:25am

Yeah my point was that they wouldn’t matter from the point of view of recognition, only human categorisation. “Speaker profile” could be completely independent of that.

Yes mostly because it’s dependent on knowing a hell of a lot about each speaker and their background, which violates the privacy requirements of the project. That and after thinking about it for a bit I realised that accent is a broad and deep problem that is very difficult to apply categories to, filled with social, cultural and cognitive biases that invites decades of bikeshedding. Solving it using machine learning seems like an obvious solution to that.

I meant in the context of recognition. But the other points you raise are good ones.

Maybe there’s some value in finding a few sentences that work for everyone and offer the most amount of variation, and inviting university researchers to build a database of tagged recordings that could be used to detect accent?

robovoice · October 27, 2021, 9:53pm

And what is with copying a fully 10000h trained and working standard english language model (just as example) with a 10000h fully trained britsh isles dialect (scottish) and train both together? If this works you add more (irish) and so on.
But the more you add you are leaving Terabyte volumes and end in PetaByte or Storage Farms just for trying this. Processing and storage of such huge amounts of data would be the challenge.

Topic		Replies	Views
Common Voice and accent choice: new paper about accents in Common Voice Common Voice	1	489	November 2, 2023
Bias against accented speech from voting instead of transcribing Common Voice	9	890	February 3, 2023
Privacy concerns about dataset metadata Common Voice dataset	7	2788	May 16, 2019
Common Voice languages and accent strategy v5 Common Voice announcements	13	5640	August 4, 2021
Labelled data of Native and non-native speakers Common Voice	3	497	January 21, 2024

Thoughts on accents

Related topics