Kurdish language (mis)classification and Kurmanji request

Kurdish is one of the languages in progress. However I think it is necessary to classify which one of the Kurdish dialects it is. Kurdish language family consists of many major dialects: Kurmanji, Sorani, Palewani, Zaza and Gorani. Mostly spoken among these are Kurmanji and Sorani.
I am guessing the Kurdish listed in Common Voice is the Sorani dialect since it is predominantly written in the Perso-Arabic script.
I would like to collect voice samples in Kurmanji dialect, which is written in latin-based Hawar script and spoken by 15 million people.
Would it be possible to add it to Pontoon as a new language so that Kurmanji speakers can also contribute?

If the language is on our standard list, it can be added to pontoon as a new language:

https://unicode-org.github.io/cldr-staging/charts/37/supplemental/languages_and_scripts.html

If it’s not listed there as a language, we’ll capture phonetic differences through accents. Today I’ll be posting an big update on our language and accents strategy.

Published now: Common Voice languages and accent strategy v5

do we have any updates here? We still dont have Kurmanji with latin script option.

Can you help us understand which language are you referring to?

I see:

  • Central Kurdish ckb N Arabic
  • Kurdish ku N Arabic Cyrillic Latin
  • Southern Kurdish sdh N Arabic

Thanks!

I am referring to Northern Kurdish (ku) in Latin script. (Honestly I dont think this dialect is used in Arabic or Cyrillic script at all. Latin should be sufficient.)

I am seeing the language in the list, but strangely I cannot contribute to that. Is there any other process involved in that?

OK, I see where the issue is coming from, the Kudish enabled on pontoon was localized using the Arabic script instead of the Latin one.

We are currently investigating how we can enable a language with different scripts. Let me circle back to the team since we need to understand how to make our platform support this.

Thanks for your patience.

For kurdish language now there are 3 common active dialects
1- Kurmanji

  • the most common dialect in kurdish - usage latin 95%
    so you can write kurmanci as kurdish(kurmanji)
    -kurmanji -usage arabic 5% this very little
    you can write its name in arabic as kurdish(Ú©ÙˆŰ±Ù…Ű§Ù†ŰŹÛŒ)

2-Sorani -arabic
you can write its name as kurdish(ŰłÙˆŰ±Ű§Ù†ÛŒ)

3-Zazaki - latin kurdish(zazaki)

It would great if you could check the list I linked to let us know which locale codes are you referring to. In the list there are no common names, just Kurdish and the script used.

ok in that list kurdish is so:
|Kurdish|[ku]

|Arabic|Arab||
|Cyrillic|Cyrl|
|Latin|Latn|

Then as this list you will show kurdish in three names
1-Kurdish /Arabic
2-Kurdish/Cyrillic
3-Kurdish/Latin

Thanks for the info, we are still scoping how to support different scripts, which is something we haven’t done yet.

I’ve passed this to the dev team and we will come back once we have more clarity on a plan. Thanks for your patience!

I brought this up with the Pontoon/Localisation team in February. Here is what I wrote:

I just went to look at Kurdish in Pontoon, and it appears that the two-letter language code, ‘ku’ is being used for Sorani (Southern) Kurdish
Usually the two letter code is used for Kurmanji (Northern) Kurdish. But it might be better to just use the three-letter codes:
ckb – Sorani
kmr – Kurmanji
Examples from Wikipedia:
https://ku.wikipedia.org/wiki/DestpĂȘk
https://ckb.wikipedia.org/wiki/ŰŻÛ•ŰłŰȘٟێک
I don’t know what the process is for changing this, or if it can be changed.
Thanks!
Fran

@osmanoca @Fatih_Kurt @ok_alp I recommend getting in contact with the Locale lead for Kurdish on Pontoon, Davud Kakaie.

Please feel free to reply here, or visit us on Matrix, in #Common Voice and in #Pontoon.

I brought the issue up on #Pontoon just now:

Apparently the right place is actually #l10n-community, I’ve brought it up there too.

The Common Voice team are now planning to remove ku and migrating the content to ckb, here is the issue:

1 Like

Expected confusion. When I first began working on translating Common Voice I did contact Peipying about the expected situation for Kurdish, stemming from the fact that Kurdish includes 2 major dialects, Centrak (ckb) primarily using Arabic script and Kurmanji being based on Latin. I also spoke about the possiblity of adding ckb so I could work on translating that, however the decision back then was to continue with the current status until further notice. Now is the time to act. Considering that most Kurdish population speaks Kurmanji, keeping ku as general for Kurdish is also possible, however sticking with ISO codes is the best we can do: ckb (Sorani/Central-Arabic) and kmr (Kurmanji)

1 Like

Yep, I agree with this. I am trying to contact the Kurmanji speakers from this post, with some success, but if you know anyone else, please consider passing it on.

Update on Kurmanji:

2 Likes

Thanks for the persistence and follow up all. :slight_smile:

1 Like

Dear ladies and gentlemen,

Kurdish consists of 3 languages: KurmanjĂź (Northern), SoranĂź (Central) and KalhurĂź (Southern).
Zazaki and Gorani/Hawrami are independent northwest Iranian languages. So also it’s the linguistic classification. There is no linguisitic reason or argument to classify Zazaki among Kurdish.

I ask to correct the classification of Zazaki in Mozilla to “Zazaki”.
The phrase “Kurdü (Zazakü)” would sound like “German (English)” or “Farsü (Kurdü)”

Best regards

1 Like