List of languages with variants launched on common voice

Hey!

I am looking for a list of languages on MCV that have multiple variants. Possibly languages that have multiple scripts translated under one locale.

One example I have got is portuguese (brazil and portugal). Although they are both using the same writing script (latin).

Most of the languages have some presets under accents for these, As far as I can see they might want to keep that for consistency. Except Portuguese, most major languages fall in this category…

Here are the current variants (as of 2024-09-25):

id	lc	name	tag
1	cy	cy-northwes	North-Western Welsh
2	cy	cy-northeas	North-Eastern Welsh
3	cy	cy-midwales	Mid Wales
4	cy	cy-southwes	South-Western Welsh
5	cy	cy-southeas	South-Eastern Welsh
6	cy	cy-wladfa	Patagonian Welsh
7	sw	sw-sanifu	Kiswahili Sanifu (EA)
8	sw	sw-barake	Kiswahili cha Bara ya Kenya
9	sw	sw-baratz	Kiswahili cha Bara ya Tanzania
10	sw	sw-kingwana	Kingwana (DRC)
11	sw	sw-kimvita	Kimvita (KE) - Central dialect
12	sw	sw-kibajuni	Kibajuni (KE) - Northern dialect
13	sw	sw-kimrima	Kimrima (TZ) - Northern dialect
14	sw	sw-kiunguja	Kiunguja (TZ) - Southern dialect
15	sw	sw-kipemba	Kipemba (TZ) - Southern dialect
16	sw	sw-kikae	Kimakunduchi/Kikae (TZ) - Southern dialect
17	pt	pt-BR	Portuguese (Brasil)
18	pt	pt-PT	Portuguese (Portugal)
325	ca	ca-central	Central
326	ca	ca-balear	Balear
327	ca	ca-nwestern	Nord-Occidental
328	ca	ca-northern	Septentrional
329	ca	ca-valencia-southern	Valencià meridional
330	ca	ca-valencia-alacant	Alacantí
331	ca	ca-valencia-northern	Valencià septentrional
332	ca	ca-valencia-tortosi	Tortosí
333	ca	ca-valencia-central	Valencià central
334	ca	ca-algueres	Alguerès
503	zgh	zgh-shi	ⵜⴰⵛⵍⵃⵉⵜ (Tachelhit)
504	zgh	zgh-tzm	ⵜⴰⵎⴰⵣⵉⵖⵜ ⵏ ⵡⴰⵟⵍⴰⵚ ⴰⵏⴰⵎⵎⴰⵙ (Central Atlas Tamazight)
505	zgh	zgh-rif	ⵜⴰⵔⵉⴼⵉⵜ (Tarifit)
940	tui	tui-bangwere	Ɓaŋwere (Tupuri Bangwere)
941	tui	tui-banggo	Ɓaŋgɔ̀ (Tupuri Banggo)

Here is the long list of preset accents:

id	lc	group	token	name
1	en	default	unspecified	
2	de	default	unspecified	
3	fr	default	unspecified	
4	cy	default	unspecified	
5	br	default	unspecified	
6	cv	default	unspecified	
7	tr	default	unspecified	
8	tt	default	unspecified	
9	kab	default	unspecified	
10	ky	default	unspecified	
11	ga-IE	default	unspecified	
12	sl	default	unspecified	
13	ca	default	unspecified	
14	it	default	unspecified	
15	zh-TW	default	unspecified	
16	eo	default	unspecified	
17	nl	default	unspecified	
18	zh-CN	default	unspecified	
19	zh-HK	default	unspecified	
20	ace	default	unspecified	
21	an	default	unspecified	
22	ar	default	unspecified	
23	as	default	unspecified	
24	ast	default	unspecified	
25	az	default	unspecified	
26	bg	default	unspecified	
27	bn	default	unspecified	
28	bxr	default	unspecified	
29	cak	default	unspecified	
30	cnh	default	unspecified	
31	cs	default	unspecified	
32	da	default	unspecified	
33	dsb	default	unspecified	
34	el	default	unspecified	
35	es	default	unspecified	
36	et	default	unspecified	
37	eu	default	unspecified	
38	fa	default	unspecified	
39	fi	default	unspecified	
40	fo	default	unspecified	
41	fy-NL	default	unspecified	
42	he	default	unspecified	
43	hsb	default	unspecified	
44	hu	default	unspecified	
45	ia	default	unspecified	
46	id	default	unspecified	
47	is	default	unspecified	
48	ja	default	unspecified	
49	ka	default	unspecified	
50	kk	default	unspecified	
51	ko	default	unspecified	
52	kpv	default	unspecified	
53	kw	default	unspecified	
54	mdf	default	unspecified	
55	mk	default	unspecified	
56	mn	default	unspecified	
57	myv	default	unspecified	
58	nb-NO	default	unspecified	
59	ne-NP	default	unspecified	
60	nn-NO	default	unspecified	
61	oc	default	unspecified	
62	or	default	unspecified	
63	pl	default	unspecified	
64	pt	default	unspecified	
65	rm-sursilv	default	unspecified	
66	ro	default	unspecified	
67	ru	default	unspecified	
68	sah	default	unspecified	
69	sc	default	unspecified	
70	sk	default	unspecified	
71	sq	default	unspecified	
72	sr	default	unspecified	
73	sv-SE	default	unspecified	
74	ta	default	unspecified	
75	te	default	unspecified	
76	th	default	unspecified	
77	uk	default	unspecified	
78	ur	default	unspecified	
79	uz	default	unspecified	
80	vi	default	unspecified	
81	af	default	unspecified	
82	ab	default	unspecified	
83	ady	default	unspecified	
84	am	default	unspecified	
85	dv	default	unspecified	
86	mhr	default	unspecified	
87	mrj	default	unspecified	
88	uby	default	unspecified	
89	udm	default	unspecified	
90	vot	default	unspecified	
91	hr	default	unspecified	
92	rw	default	unspecified	
93	izh	default	unspecified	
94	lt	default	unspecified	
95	gl	default	unspecified	
96	lv	default	unspecified	
97	hi-IN	default	unspecified	
98	lij	default	unspecified	
99	tg	default	unspecified	
100	ba	default	unspecified	
101	ha	default	unspecified	
102	ckb	default	unspecified	
103	ml	default	unspecified	
104	si	default	unspecified	
105	ff	default	unspecified	
106	rm-vallader	default	unspecified	
107	syr	default	unspecified	
108	mt	default	unspecified	
109	sw	default	unspecified	
110	arn	default	unspecified	
111	be	default	unspecified	
112	mg	default	unspecified	
113	pa-IN	default	unspecified	
114	kbd	default	unspecified	
115	lg	default	unspecified	
116	my	default	unspecified	
117	scn	default	unspecified	
118	kaa	default	unspecified	
119	tl	default	unspecified	
120	vec	default	unspecified	
121	hy-AM	default	unspecified	
122	hi	default	unspecified	
123	hyw	default	unspecified	
124	co	default	unspecified	
125	gn	default	unspecified	
126	bm	default	unspecified	
127	kmr	default	unspecified	
128	bas	default	unspecified	
129	yue	default	unspecified	
130	ht	default	unspecified	
131	mai	default	unspecified	
132	mos	default	unspecified	
133	mr	default	unspecified	
134	ms	default	unspecified	
135	ps	default	unspecified	
136	shi	default	unspecified	
137	so	default	unspecified	
138	ug	default	unspecified	
139	pap-AW	default	unspecified	
140	nia	default	unspecified	
141	tw	default	unspecified	
142	yo	default	unspecified	
143	nyn	default	unspecified	
144	ie	default	unspecified	
145	sat	default	unspecified	
146	ki	default	unspecified	
147	nan-tw	default	unspecified	
148	yi	default	unspecified	
149	ig	default	unspecified	
150	ty	default	unspecified	
151	quc	default	unspecified	
152	ti	default	unspecified	
153	mni	default	unspecified	
154	tig	default	unspecified	
256	br	preset	gwenedeg	Gwenedeg
257	br	preset	kerneveg	Kerneveg
258	br	preset	leoneg	Leoneg
259	br	preset	tregerieg	Tregerieg
260	ca	preset	balearic	balear
261	ca	preset	central	central
262	ca	preset	northwestern	nord-occidental
263	ca	preset	northern	septentrional
264	ca	preset	valencian	valencià
265	ca	preset	learner_es	aprenent (recent, des del castellà)
266	ca	preset	learner_other	aprenent (recent, des d'altres llengües)
267	cy	preset	united_kingdom	Y Deyrnas Unedig Cymraeg
268	de	preset	germany	Deutschland Deutsch
269	de	preset	netherlands	Niederländisch Deutsch
270	de	preset	austria	Österreichisches Deutsch
271	de	preset	poland	Polnisch Deutsch
272	de	preset	switzerland	Schweizerdeutsch
273	de	preset	united_kingdom	Britisches Deutsch
274	de	preset	france	Französisch Deutsch
275	de	preset	denmark	Dänisch Deutsch
276	de	preset	belgium	Belgisches Deutsch
277	de	preset	hungary	Ungarisch Deutsch
278	de	preset	brazil	Brasilianisches Deutsch
279	de	preset	czechia	Tschechisch Deutsch
280	de	preset	united_states	Amerikanisches Deutsch
281	de	preset	slovakia	Slowakisch Deutsch
282	de	preset	russia	Russisch Deutsch
283	de	preset	kazakhstan	Kasachisch Deutsch
284	de	preset	italy	Italienisch Deutsch
285	de	preset	finland	Finnisch Deutsch
286	de	preset	slovenia	Slowenisch Deutsch
287	de	preset	canada	Kanadisches Deutsch
288	de	preset	bulgaria	Bulgarisch Deutsch
289	de	preset	greece	Griechisch Deutsch
290	de	preset	lithuania	Litauisch Deutsch
291	de	preset	luxembourg	Luxemburgisches Deutsch
292	de	preset	paraguay	Paraguayisch Deutsch
293	de	preset	romania	Rumänisch Deutsch
294	de	preset	liechtenstein	liechtensteinisches Deutscher
295	de	preset	namibia	Namibisch Deutsch
296	de	preset	turkey	Türkisch Deutsch
297	en	preset	us	United States English
298	en	preset	australia	Australian English
299	en	preset	england	England English
300	en	preset	canada	Canadian English
301	en	preset	philippines	Filipino
302	en	preset	hongkong	Hong Kong English
303	en	preset	indian	India and South Asia (India, Pakistan, Sri Lanka)
304	en	preset	ireland	Irish English
305	en	preset	malaysia	Malaysian English
306	en	preset	newzealand	New Zealand English
307	en	preset	scotland	Scottish English
308	en	preset	singapore	Singaporean English
309	en	preset	southatlandtic	South Atlantic (Falkland Islands, Saint Helena)
310	en	preset	african	Southern African (South Africa, Zimbabwe, Namibia)
311	en	preset	wales	Welsh English
312	en	preset	bermuda	West Indies and Bermuda (Bahamas, Bermuda, Jamaica, Trinidad)
313	eo	preset	internacia	Internacia
314	es	preset	nortepeninsular	España: Norte peninsular (Asturias, Castilla y León, Cantabria, País Vasco, Navarra, Aragón, La Rioja, Guadalajara, Cuenca)
315	es	preset	centrosurpeninsular	España: Centro-Sur peninsular (Madrid, Toledo, Castilla-La Mancha)
316	es	preset	surpeninsular	España: Sur peninsular (Andalucia, Extremadura, Murcia)
317	es	preset	canario	España: Islas Canarias
318	es	preset	mexicano	México
319	es	preset	americacentral	América central
320	es	preset	caribe	Caribe: Cuba, Venezuela, Puerto Rico, República Dominicana, Panamá, Colombia caribeña, México caribeño, Costa del golfo de México
321	es	preset	andino	Andino-Pacífico: Colombia, Perú, Ecuador, oeste de Bolivia y Venezuela andina
322	es	preset	rioplatense	Rioplatense: Argentina, Uruguay, este de Bolivia, Paraguay
323	es	preset	chileno	Chileno: Chile, Cuyo
324	es	preset	filipinas	Español de Filipinas
325	eu	preset	mendebalekoa	Mendebalekoa (Araka, Bizkaia, Gipuzkoako mendebaleko herri batzuk)
326	eu	preset	erdialdekoa_nafarra	Erdialdekoa edo Nafarra (Gipuzkoa, Nafarroa)
327	eu	preset	nafarlapurtarra_zuberoatarra	Nafar-lapurtarra edo Zuberotarra (Lapurdi, Nafarroa Beherea, Zuberoa)
328	fr	preset	france	Français de France
329	fr	preset	madagascar	Français de Madagascar
330	fr	preset	cameroon	Français du Cameroun
331	fr	preset	germany	Français d’Allemagne
332	fr	preset	united_kingdom	Français du Royaume-Uni
333	fr	preset	cote_d_ivoire	Français de Côte d’Ivoire
334	fr	preset	tunisia	Français de Tunisie
335	fr	preset	mali	Français du Mali
336	fr	preset	algeria	Français d’Algérie
337	fr	preset	canada	Français du Canada
338	fr	preset	morocco	Français du Maroc
339	fr	preset	burundi	Français du Burundi
340	fr	preset	senegal	Français du Sénégal
341	fr	preset	niger	Français du Niger
342	fr	preset	netherlands	Français des Pays-Bas
343	fr	preset	togo	Français de la République du Togo
344	fr	preset	burkina_faso	Français du Burkina-Faso
345	fr	preset	belgium	Français de Belgique
346	fr	preset	congo_brazzaville	Français du Congo (Brazzaville)
347	fr	preset	congo_kinshasa	Français du Congo (Kinshasa)
348	fr	preset	italy	Français d’Italie
349	fr	preset	benin	Français du Bénin
350	fr	preset	romania	Français de Roumanie
351	fr	preset	guinea	Français de Guinée
352	fr	preset	chad	Français du Tchad
353	fr	preset	central_african_republic	Français de République centrafricaine
354	fr	preset	united_states	Français des États-Unis
355	fr	preset	switzerland	Français de Suisse
356	fr	preset	portugal	Français du Portugal
357	fr	preset	gabon	Français du Gabon
358	fr	preset	syria	Français de Syrie
359	fr	preset	greece	Français de Grèce
360	fr	preset	austria	Français d’Autriche
361	fr	preset	ireland	Français d’Irlande
362	fr	preset	reunion	Français de La Réunion
363	fr	preset	mauritania	Français de Mauritanie
364	fr	preset	luxembourg	Français du Luxembourg
365	fr	preset	haiti	Français d’Haïti
366	fr	preset	comoros	Français des Comores
367	fr	preset	martinique	Français de Martinique
368	fr	preset	guadeloupe	Français de Guadeloupe
369	fr	preset	hungary	Français d’Hongrie
370	fr	preset	new_caledonia	Français de Nouvelle-Calédonie
371	fr	preset	french_polynesia	Français de Polynésie française
372	fr	preset	french_guiana	Français de Guyane
373	fr	preset	vanuatu	Français du Vanuatu
374	fr	preset	mayotte	Français de Mayotte
375	fr	preset	cyprus	Français de Chypre
376	fr	preset	equatorial_guinea	Français de Guinée équatoriale
377	fr	preset	seychelles	Français des Seychelles
378	fr	preset	malta	Français de Malte
379	fr	preset	mauritius	Français de l’Île Maurice
380	fr	preset	st_martin	Français de Saint-Martin
381	fr	preset	monaco	Français de Monaco
382	fr	preset	lebanon	Français du Liban
383	fr	preset	djibouti	Français de Djibouti
384	fr	preset	wallis_et_futuna	Français de Wallis et Futuna
385	fr	preset	st_barthelemy	Français de Saint-Barthélemy
386	fr	preset	andorra	Français d’Andorre
387	fr	preset	st_pierre_et_miquelon	Français de Saint-Pierre-et-Miquelon
388	fr	preset	rwanda	Français du Rwanda
389	nl	preset	netherlands	Nederlands Nederlands
390	nl	preset	belgium	Belgisch Nederlands
391	nl	preset	suriname	Surinaams Nederlands
392	nl	preset	france	Frans Nederlands
393	nl	preset	germany	Duits Nederlands
394	nl	preset	curacao	Nederlands van Curaçao
395	nl	preset	aruba	Nederlands van Aruba
396	nl	preset	sint_maarten	Nederlands van Sint-Maarten
397	nl	preset	south_africa	Zuid-Afrikaans Nederlands
398	nl	preset	namibia	Namibisch Nederlands
399	nl	preset	indonesia	Indonesisch Nederlands
400	ga-IE	preset	mumhain	Gaeilge na Mumhan
401	ga-IE	preset	connachta	Gaeilge Chonnacht
402	ga-IE	preset	ulaidh	Gaeilge Uladh
403	gl	preset	atlantico	Atlántico (seseo e gheada)
404	gl	preset	central	Central (gheada)
405	gl	preset	oriental	Oriental (común en zona oriental)
406	gl	preset	normativo	Normativo (estándar)
407	gl	preset	neofalante	Neofalante
408	zh-TW	preset	keelung_city	出生地:基隆市
409	zh-TW	preset	taipei_city	出生地:臺北市
410	zh-TW	preset	new_taipei_city	出生地:新北市
411	zh-TW	preset	taoyuan_city	出生地:桃園市
412	zh-TW	preset	hsinchu_county	出生地:新竹縣
413	zh-TW	preset	hsinchu_city	出生地:新竹市
414	zh-TW	preset	miaoli_county	出生地:苗栗縣
415	zh-TW	preset	taichung_city	出生地:臺中市
416	zh-TW	preset	changhua_county	出生地:彰化縣
417	zh-TW	preset	nantou_county	出生地:南投縣
418	zh-TW	preset	yunlin_county	出生地:雲林縣
419	zh-TW	preset	chiayi_county	出生地:嘉義縣
420	zh-TW	preset	chiayi_city	出生地:嘉義市
421	zh-TW	preset	tainan_city	出生地:臺南市
422	zh-TW	preset	kaohsiung_city	出生地:高雄市
423	zh-TW	preset	pingtung_county	出生地:屏東縣
424	zh-TW	preset	yilan_county	出生地:宜蘭縣
425	zh-TW	preset	hualien_county	出生地:花蓮縣
426	zh-TW	preset	taitung_county	出生地:臺東縣
427	zh-TW	preset	penghu_county	出生地:澎湖縣
428	zh-TW	preset	kinmen_county	出生地:金門縣
429	zh-TW	preset	lienchiang_county	出生地:連江縣
430	zh-TW	preset	hong_kong	香港
431	zh-CN	preset	110000	出生地:11 北京市
432	zh-CN	preset	120000	出生地:12 天津市
433	zh-CN	preset	130000	出生地:13 河北省
434	zh-CN	preset	140000	出生地:14 山西省
435	zh-CN	preset	150000	出生地:15 内蒙古自治区
436	zh-CN	preset	210000	出生地:21 辽宁省
437	zh-CN	preset	220000	出生地:22 吉林省
438	zh-CN	preset	230000	出生地:23 黑龙江省
439	zh-CN	preset	310000	出生地:31 上海市
440	zh-CN	preset	320000	出生地:32 江苏省
441	zh-CN	preset	330000	出生地:33 浙江省
442	zh-CN	preset	340000	出生地:34 安徽省
443	zh-CN	preset	350000	出生地:35 福建省
444	zh-CN	preset	360000	出生地:36 江西省
445	zh-CN	preset	370000	出生地:37 山东省
446	zh-CN	preset	410000	出生地:41 河南省
447	zh-CN	preset	420000	出生地:42 湖北省
448	zh-CN	preset	430000	出生地:43 湖南省
449	zh-CN	preset	440000	出生地:44 广东省
450	zh-CN	preset	450000	出生地:45 广西壮族自治区
451	zh-CN	preset	460000	出生地:46 海南省
452	zh-CN	preset	500000	出生地:50 重庆市
453	zh-CN	preset	510000	出生地:51 四川省
454	zh-CN	preset	520000	出生地:52 贵州省
455	zh-CN	preset	530000	出生地:53 云南省
456	zh-CN	preset	540000	出生地:54 西藏自治区
457	zh-CN	preset	610000	出生地:61 陕西省
458	zh-CN	preset	620000	出生地:62 甘肃省
459	zh-CN	preset	630000	出生地:63 青海省
460	zh-CN	preset	640000	出生地:64 宁夏回族自治区
461	zh-CN	preset	650000	出生地:65 新疆维吾尔自治区
462	zh-CN	preset	710000	出生地:71 台湾省
463	zh-CN	preset	810000	出生地:81 香港特别行政区
464	zh-CN	preset	820000	出生地:82 澳门特别行政区
465	tk	default	unspecified	
665	quy	default	unspecified	
993	bs	default	unspecified	
994	gom	default	unspecified	
995	km	default	unspecified	
996	knn	default	unspecified	
1285	skr	default	unspecified	
1286	tok	default	unspecified	
1623	lb	default	unspecified	
3733	dyu	default	unspecified	
3734	nso	default	unspecified	
3735	om	default	unspecified	
3736	st	default	unspecified	
3737	ts	default	unspecified	
3738	ve	default	unspecified	
4071	nd	default	unspecified	
4072	nr	default	unspecified	
4073	ss	default	unspecified	
4074	tn	default	unspecified	
4075	xh	default	unspecified	
4076	zu	default	unspecified	
4498	hil	default	unspecified	
4758	jbo	default	unspecified	
4759	kn	default	unspecified	
4760	ln	default	unspecified	
4761	lo	default	unspecified	
4762	sdh	default	unspecified	
4763	snk	default	unspecified	
4764	zgh	default	unspecified	
5485	dag	default	unspecified	
5486	zza	default	unspecified	
5513	byv	default	unspecified	
5753	tyv	default	unspecified	
5754	wo	default	unspecified	
5988	nhe	default	unspecified	
6343	nhi	default	unspecified	
7457	bo	default	unspecified	
7458	ny	default	unspecified	
9680	ltg	default	unspecified	
10517	os	default	unspecified	
10681	ewo	default	unspecified	
10682	jv	default	unspecified	
10783	gu-IN	default	unspecified	
10896	lzz	default	unspecified	
11398	sco	default	unspecified	
11825	qvi	default	unspecified	
11849	wep	default	unspecified	
12019	fuf	default	unspecified	
12265	guc	default	unspecified	
12266	sd	default	unspecified	
12267	vmw	default	unspecified	
14168	unknown	default	unspecified	
14289	crh	default	unspecified	
14342	dav	default	unspecified	
14389	luo	default	unspecified	
14417	kln	default	unspecified	
15030	azz	default	unspecified	
15214	bal	default	unspecified	
15215	gos	default	unspecified	
16012	nqo	default	unspecified	
16106	cdo	default	unspecified	
16107	dar	default	unspecified	
16108	shn	default	unspecified	
16347	aa	default	unspecified	
16348	abb	default	unspecified	
16349	bax	default	unspecified	
16350	bba	default	unspecified	
16351	bbj	default	unspecified	
16352	bci	default	unspecified	
16353	beb	default	unspecified	
16354	bfd	default	unspecified	
16355	bkm	default	unspecified	
16356	bnm	default	unspecified	
16357	bri	default	unspecified	
16358	bum	default	unspecified	
16359	dua	default	unspecified	
16360	ebr	default	unspecified	
16361	eto	default	unspecified	
16362	fan	default	unspecified	
16363	fue	default	unspecified	
16364	gya	default	unspecified	
16365	ibb	default	unspecified	
16366	ksf	default	unspecified	
16367	mbo	default	unspecified	
16368	mxu	default	unspecified	
16369	nnh	default	unspecified	
16370	qxp	default	unspecified	
16371	rif	default	unspecified	
16372	teg	default	unspecified	
16373	tui	default	unspecified	
16374	tvu	default	unspecified	
16375	wes	default	unspecified	
16503	ee	default	unspecified	
16504	fub	default	unspecified	
16505	pcm	default	unspecified	

I don’t know about scripts…
Except one language, variants collected in the last call are not yet inserted into the database.

2 Likes

Thank you so much @bozden for always being so helpful!!

1 Like

We all learn together :slight_smile:

1 Like

I think the “preset assents” are not possible anymore after certain time on 2022. I’m trying to add some for nan-tw but failed in https://github.com/common-voice/common-voice/issues/3708

@irvin, I can see your https://github.com/common-voice/common-voice/pull/4075 is there but just not been merged. I wonder why.

The team wouldn’t think that accents are not needed anymore - I mean after variant availability. And having free-form accent definitions are way away from ideal.