Konkani and its Variants: Which script to pick for MCV website interface?

Hello konkanis!
I am trying to reason out which konkani script to use for the MCV website. The Konkani dataset however will support all other scripts too (multi-orthography). People will still be able to speak/write in their script of choice. But the Mozilla Common Voice (MCV) website cannot be in multiple scripts.

The scripts being used to currently write Konkani are: Devanagari, Romi, Kannadi, Malayalam, and Perso-Arabic.

I invite all konkani speakers to make their points for which script they would prefer for using the website (also please mention where you are from and which dialect you speak).

I have made my own points below in attempt to answer the following questions.

Main Questions:

  1. Would Konkani in देवनागरी script for MCV website be easy to understand by konkani literates in Karnataka(KA), Goa (GA), Maharashtra (MAH)?
  2. Given that most of the jobs require us to write and read in english, would Roman script be easier to understand by most speakers of konkani (including MAH, KA, GA)?
  3. Can Kannadi script be understood by GA and MAH speakers? (Its an obvious no, given that they either use devnagri or roman script to read and write)
  4. Can konkani speakers from MAH and KA understand the words/vocabulary spoken in Goa?

Important note

  • Other scripts will still be supported in one of the databases. This is only a discussion for website language.

Brief notes about the konkani language

  1. Konkani is mainly spoken in Goa. But Goans are not the only ones who speak konkani. On the map, there is a konkan region. The northern part of konkan region is in Maharashtra, the central part in Goa, and the southern part in karnataka.
  2. Konkani is officially recognised as an individual language as per the Language Census 1977 by the Govt. Of India. It is not a dialect of Marathi.
  3. Goan Konkani, Maharashtrian Konkani and Karnataka Konkani are broad categories of the Konkani variants/dialects. But even inside these states, there are differences in speaking and writing.
  4. Within Goa, the Antruz variant is pushed as the standard in schools and colleges. But there are more variants/dialects classified under “Goan Konkani” such as bardesi and saxtti.
  5. In Karnataka, the konkani dialects are GSB (Gaude Saraswat Brahmin) and GSC (Gaude Saraswat Christian Brahmin).
  6. Lobab on https://extraetc.wordpress.com, says:

Any effort of thrusting one dialect as the standard – such as what is happening in Goa, will lead into the disintegration of the language, slowly but surely!

  1. While on wikipedia under “current status and issues”, it reads:

Konkani language has been in danger of dying out over the years, one of the reasons being the fragmentation of Konkani into various, sometimes mutually unintelligible, dialects.

  1. While the website in roman script would be easier to read for most young people, they might not know some rules such as ‘m’ and ‘n’ being used for nasalized vowels. Forcing roman script on all konkani speakers would deter contributors because they might not understand the words. Apart from nasalization, there is also vowel substitution for most words (अ in devanagari script is changed to ऑ in roman script).

  2. Devnagri website would be easier to translate in other languages, but difficult to read due to the presence of modifiers above and below the letters. Sometimes 2-3 letters are combined and it looks very complicated on screen - making it really difficult to read. But then devanagari users can change their font size.


Konkani in Karnataka (Canara, canarese)

  1. The Question Papers (search google for konkani language karnataka question papers) of recent (2017-2024) final year of high school (10th standard) exams in Konkani Subject are in both Devanagari and Kannada writing scripts in the State of Karnataka.
  2. However, as the konkani syllabus/curriculum is prepared mainly in the kannada script, it indicates that KA students are reading & writing mainly in that script.
  3. Class 10 Hindi Subject exams are conducted in devanagari script (search google), but it is kept as 3rd language. Meaning they can choose to study some other language in place of hindi. Which means some of them might not get their dose of devanagari script learning.
  4. Kannada language (in same script) is taught as either the 1st or 2nd language for schools upto 10th std. English is also either 1st or 2nd language.
  5. Conclusion: Speakers of this konkani variant may not find it easy to understand websites in devanagari.
  6. KARNATAKA SCHOOL EXAMINATION AND ASSESSMENT BOARD website

Konkani in Maharashtra

  1. There is no mention of “Konkani” as a subject for schools of SSC and HSC on maharashtra board of education. Hence I haven’t been able to retrieve exam question papers of previous years.
  2. There is mention of “Konkan Divisional Board” under the Maharashtra Education Board which may be teaching konkani in Ratnagiri and Sindhudurg districts. But, i have not found source to a single question paper. I tried contacting the konkan divisional board by email, but “the address was not reachable”.
  3. They obviously will use Devanagari script to write konkani, as marathi is also written in devanagari.
  4. (Based on News) Speaking and Studying Marathi is “mandatory by law without exception” in Maharashtrian schools.
  5. ?Can speakers of this variant understand websites translated in “standard” Goan Konkani?
  6. Maharashtrian konkani is in gray area of being a dialect of marathi and dialect of konkani.

Konkani in Goa

  1. The Question Papers for konkani subject exams in 10th, 11th, 12th standard (schools and pre-university) are in devanagari script ONLY in the State of Goa (search google konkani language goa board question papers). (Only Q.P. of 10th class in years 2018 and 2019 are uploaded. I am not aware of any changes made to the konkani subject after NEP 2020 was implemented.)
  2. 10th std. Konkani Assessment Scheme prepared by Goa Board of Secondary Education is also written in devanagari script.
  3. Konkani is the local language of the people of Goa.
  4. Devanagari is given more attention during the formation of new words. It is the official writing script for konkani used in schools and govt. officies in the State of Goa.
  5. Roman script is currently used mainly by christians in bibles, select weeklies, magazines and theatre-drama (tiartr).
  6. Naturally, there is a large number of people in Goa who have studied konkani in devanagari script.
  7. (Politics) No proposal to include Konkani written in Roman script in Goa’s official language Act: CM Sawant

Kerala Konkani

  1. Survey on konkani in State of Kerala done in 1971
  2. Kochi has konkani speakers? Which script do they use? Are they following a standard?
1 Like

For example:
I vote in favor of devnagri script to be used for konkani website because:

  1. The fact that google can currently only translate devanagari script konkani (देवनागरी लीपी कोंकणी) into other languages (english, hindi, kannada, malayalam, etc)
  2. The fact that most konkani literates will understand at least 1 additional language other than konkani.
  3. The fact that we can continue reading the sentences in the konkani script which we are working with (any of the 5 scripts), even when translation is on. After testing on chrome and firefox (firefox with “TWP Translate Web Pages” extension), everything is translated except the “sentence cards”. This maintains the core functionality of common voice website even when the user is dependent on translation software.

I talked just now with a konkani professor from karnataka. They might really need the website to be in kannada script.

  1. Because for them the kannada language is taught from Standard/Grade 1. While Hindi (devanagari script) is taught from grade 5. They give more preference to kannada script in karnataka.

  2. The other reason is the vocabulary. There are many Konkani words in karnataka that are different from Goan konkani.

1 Like

This is an unfair choice. Konkani is probably the only language in the world to be currently written in five scripts. This diversity has to be taken into account; technology has to be adjustable.
I initiated the Konkani Wikipedia in Incubation sometime around 2006, 2007. We there took a conscious decision to permit ALL scripts to function simultaneously We managed (though still struggling)( with three scripts – http://gom.wikipedia.org We would have loved to work with the two smaller communities too if possible (Malayalam script and Perso-Arabic).
Please consider how you could do your best to accomodate all. That would be really help.
FN/Frederick Noronha
+91-9822122436

@Frederick_Noronha
Unfortunately, like most websites, Mozilla’s localisation system does not support 1 ‘language code’ to have multiple-script website localisation (website language).

Website localisation is done as per the language code (ISO-639 spec). Most major languages have 1 language code. Konkani has 3: kok, gom, knn.

I think we should utilise these 3 to create separate locales for romi, kannada and devnagri scripts. Separating the localisation based on script rather than based on region (Maharashtra/goa/karnataka/kerala) would be better as Romi is largely based on bardezi dialect, kannada based on mangluri and devnagri based on antruzi. Although we can always make room for mixing of vocabulary from other dialects to remain inclusive :wink:

In the case of the other two scripts (Malayalam & Perso-arabic), would it be better to establish new ISO-639 language codes for them as konkani currently has only 3?

Respectfully disagree.

IMHO, CV should adopt BCP-47 language codes instead of ISO-639-3 codes (actually it uses a mix of ISO-639-1, ISO-639-3 and BCP-47 codes currently, see this code).

The reason for this is that BCP-47 allows distinguishing a spoken language from its orthographic variations. For example, Azeri az can be written in either Cyrillic or Latin, and I think in Arabic script too (not sure).

The BCP-47 codes defined for Azeri are:

  • az - Azeri - irrespective or orthography or geographic variant
  • az-Cyrl - Azeri as written in Cyrillic, irrespective of geographic variant
  • az-Latn - Azeri as written in Latin, irrespective of geographic variant
  • az-Latn-AZ - Azeri as written in Latin, as spoken in Azerbaijan

That is, BCP-47 allows finer-grained representation of written and spoken language - including the ability to distinguish between multiple orthographies of the same spoken language.

1 Like

@kathyreid, the problem is a single language with multiple scripts can only have a single frontend language, which is defined in Pontoon.

So, you can define the az in Pontoon & as dataset language in CV (they should be in parallel), and you can have now have sentence & speech variants, like the others you listed above (bcp-47). But you have to choose one (Cyrillic or Latin in this case) for the frontend.

Ah, now I understand. Is it possibly to have front-ends, one for each orthography, that are connected to the same CV dataset?

@kathyreid, unfortunately what you say is not possible.

It is “technically” possible if you totally divide the dataset (e.g. into az-cyrl, az-latn), but AFAIK it is not desired as they are variants, not languages. You can have the same “sounds” in both datasets due to transliteration for example, and you should join them. The Konkani dataset would be divided into 5 :frowning:

I’m currently helping Circassian languages (ady, kbd) where the most diaspora communities are in Turkey, I had to create a transliteration variant (e.g. ady-Latn-TR-t-ady-cyrl - Latin-Turkish alphabet) because very few can read Cyrillic here. But the frontend should be Cyrillic. They will click “blindly”. We need to teach them with online courses - press that, than that etc, or better we started with Turkish interface - they can switch.

Sorry to hijack the thread @chasingdragonflies

1 Like

I have noticed that on Pontoon, the Romansh language (rm) has been set up with BCP-47 language tags of their standard variants: rm-sursilv and rm-vallader. Can’t this be done for Konkani? In Karnataka, the manglorean variety of konkani is popular. And since it is karnataka, almost all of the konkanis there write in the kannada script.

But yes, @bozden you’re right, that doesn’t make it a separate language.

Still, a significant amount of konkani speakers come from karnataka. Almost the size of Goa. And they deserve the konkani website to be in kannada script as well.

Also, konkani has a lot of dialects. Even though there is devnagri in goa being the so called “standard” and “official”, it’s dialect itself (antruzi) is spoken by a small number of people. In Goa, some of the dialects are: Pednekari, Bardezi, Saxtti, Antruzi, Gaudi, Kunbi, Kankonkari, etc.

Major varieties are Bardesi, Saxtti, Antruzi.

Historically, saxtti was once the standard written form in the 16th and 17th centuries. But it still lives today largely as a spoken dialect.

In karnataka, there are total 42 dialects. According to one konkani professor, the major spoken dialects are Kudmi, Siddi, GSB, Mangalorean catholic.