Common Voice classification of German audio samples

Hi everyone,

as a linguist and believer in open source, I am delighted to learn that you have started this dataset thus freeing speech recognition from big corporate control.

I would like to add my opinion on the way you are grouping the sound samples.

Contrary to English, the varieties of German should not be split along political borders.
The variations within Germany are by far more important than the variations between Germany and Austria.
Even if all speakers are pronouncing phrases in standard German (Hochdeutsch/Schriftdeutsch), the influence of regiolect remains the most important factor influencing accent varieties.

The classification of linguasphere (4 main variants) could be a starting point:

Note that the average speaker might often not be able to reliably identify their own dialect.

However, asking the speaker which town/village they think that their language use is typical of if any (to exclude atypical samples from persons who have recently migrated for example) should provide reliable data. This can then be mapped to the dialect groups and should provide a usable classification that can be used similarly to accent classification for the variants of English.

On the other hand, most international variants like “Griechisch Deutsch” seem irrelevant and confusing to me (due to the tiny number of speakers). These should be reorganized in a sub-menu as “German with a foreign accent” if they are at all relevant.

As I have learned, Ruben is already working to re-organize this. So if I can be of any help regarding German, please let me know.

Good points I agree with most of them. But imo we should stick with all international forms of German and accens like namibian German, there is no reason to delete them.

Note that the average speaker might often not be able to reliably identify their own dialect.

To simplify this for the everage user maybe it could be helpful to just call these categories “north German”, “middle German”, “south German + austrian” and “swiss german/alemanic”. More people might be able to understand this compared to the scientific terms.

Maybe a general kategory “german” for people without any dialect could still be useful.

PS: did you know that there is a frisan version of Common Voice? and there have been discussion about a swiss german version too.

The plan as far as I know is to allow users to enter country, region and nearest city. This is explained in more detail here: đź—Ł Feedback needed: Languages and accents strategy

1 Like

Same problem for other languages where administrative boundaries do not match i.e. Arabic, Basque

Providing nearest city is no doubt a much better approach than the current classification. Cities have always been cultural hot-spots and language evolution is linked to cultural evolution.
One should bear in mind that there are some complex situations like multilingual cities (i.e. Brussels, Belgium or Fribourg, Switzerland). It must be possible to assign a city to more than one language.

I don’t see much interest in asking for the region, because language variations don’t match administrative boundaries. If you look at the map above, only 8 German regions (Länder) find themselves (largely) within one area, 7 are part of two areas and Baden-Württemberg even belongs to three different areas.

What about a more radical approach: asking speakers to choose a location on a map (via Openstreetmap API).
Everyone would be able to do so and using this data would become highly flexible and future-proof:

  • Developers or linguists wanting to understand gradual evolution of language phenomenons could do so by selecting relevant data by defining a geographical zone (and refining it afterwards).
  • Situations where closest city is not in the same country/region would not pose any problem. I am thinking of German-speaking persons in southern Denmark for example - closest city Flensburg is not very distant but selecting “country–>region–>city” would leave them unable to select Flensburg. Same for Basel (Switzerland) and the very south-western corner of Germany etc.
  • In the future, shifts in administrative control and also geographically shifting language phenomenons can be more easily dealt with.
  • Avoiding headaches for users in regions with unclear political status/conflicts. No need to worry if someone from this or that town is a legitimate speaker of this or that language and no permanent re-working of countries-languages list. They just give one language and a set of coordinates and that’s it.

Add an option to choose a 50km radius instead of a precise location for those who worry about their privacy.

Here’s a suggestion for a user interface:

My pronunciation is influenced by accent:

() very little	()moderately 	()notably	()heavily		()don't know

My language use is typical for the following location: --> Openstreetmap API
         [] more than one location

My language use is influenced by 
[] native language:___________  
      [] more than one 
[]  social group: ___________    
      [] more than one 
[] other factors:___________

Remarks: Allow users to choose a location even if they think that they don’t have any accent because, linguistically speaking, zero accent does not exist and common knowledge about accents differs from what is useful for linguistics and speech recognition. Sociolect is not very relevant for German, but linguistically interesting (i.e. Cockney, East London). Could be implemented as a text field for later use or drop-down menu for languages with notable sociolects.
“Other factors” allows the developers to learn about their users and modify the UI in the future if relevant factors come up. (I am hereby volunteering to keep an eye on the last two points should that be needed.)

If variants like “Greek German” are to be be kept, at the very least, someone should make clear what this actually is:
The utterances of Germans who have migrated to Greece some decades/centuries ago and kept speaking German? The utterances of Greek persons who have migrated to Germany and now speak German but with a Greek accent? Both do exist but they are completely different, obviously. This is not at all clear for the average user and highly confusing.

Thanks for the feedback, we are definitely considering a more location-oriented metadata strategy to understand how people is likely to sound. We are right now evaluating with our legal team the requirements and limitations.