Romansh has 5 different varieties (idioms): Sursilvan, Vallader, Surmiran, Puter, Sutsilvan

As part of the project «72 uras» we want to gather as much young people as possible from the 16th to 19th January 2020 which help collecting voice recordings in all five Romansh varieties for Common Voice (https://72h.ch/de/projektdetail?tx_seventytwohours_piprojects[project]=46&tx_seventytwohours_piprojects[action]=detail&tx_seventytwohours_piprojects[controller]=Project&cHash=97b11bd3b295b13dd18b0b97a3fa8eab) in collaboration with the youth organization Giuventetgna Rumantscha GiuRu (https://www.giuru.ch/) and the romansh umbrella organization Lia Rumantscha (http://www.liarumantscha.ch/).

At the moment there are just two varieties available on Pontoon: Sursilvan and Vallader. We have further heard that new languages won’t be accepted for the moment. But having the five varieties available is crucially important for the project. Therefore the question: Can we expect that the missing varieties will be accepted as part of the Common Voice project (soon enough to give us the needed time to translate the interface - let’s say until the 15th of december) or do we have to cancel the project?

I hope that we will be able to realize the project and contribute to Common Voice!

PS: You might wonder why such a small language as Romansh should have five varieties represented in Common Voice, while there is just one for English. Among linguistic arguments there is a practical one: A Romansh person that speaks for instance Vallader won’t be able to pronounce a sentence in Sursilvan.

For further information, see https://en.wikipedia.org/wiki/Romansh_language and especially https://en.wikipedia.org/wiki/Romansh_language#Sample_text

Hi,

Thanks for posting. We have a legal limitation on age, we can’t collect voices from people under 19 as explained in our site terms of use.

About languages, we are currently finishing our strategy about them, but right now we are considering a text dataset-language a common writing system that contains the same words, grammar and script, acknowledging that non-formal expressions can happen in different territories.

It’s high likely we will finally rely on this standardized list:

https://www.unicode.org/cldr/charts/latest/supplemental/languages_and_scripts.html

For this case Romansh seems to be considered just one language, but we will be able to capture voice varieties on audio clips once we implement our new accents strategy.

And the languages requirements to be launched and have enough data to train a STT model:

Thank you for your answer. I agree with most of your points, but actually the Romansh idioms do differ in lexicon and grammar. That’s why an average speaker won’t be able to pronounce sentences in other «dialects» (idioms). So, if all varieties would be mixed, a contributor would have to check and skip around 4 sentences until he finds a sentence he can read. That’s not feasible.

Hi. I’m grateful for the fact, that Romansh sursilvan and soon also Romansh vallader can be part of the Common voice project. This project is for Romansh not only a ‘nice to have’, it’s both a great chance and a big contribution to our effort for language promotion and salvation as well.
I’d like to reinforce the importance of recording all five written forms of Romansh. In the insolation of the Alps, Romansh developed in five different written forms and in many more various dialects. So if we name for example Romansh sursilvan, it’s a written form of different dialects in the region Surselva. So, even within this part of area, it’s a compromise between ‘Ju ò bétg bugen tè’ and ‘Jau ai buga bugén tai’, which ends in ‘Jeu hai buca bugen tei’, for ‘I don’t like you’. The possibility to be able to record all the five written compromises, one for each region, it’s crucial for the promotion of this minority language of 60’000 speakers. At school, we teach five written forms, in every region its own one. (the canton of Graubünden is right now creating new language books for every single region (http://www.mediomatix.ch/products/). So it’s great to have the Common voice project for the bigger idioms Romansh sursilvan and Romansh vallader, but at least as important for the smaller ones Romansh sutsilvan, Romansh puter and Romansh surmiran.

I refer to your text:

but right now we are considering a text dataset-language a common writing system that contains the same words, grammar and script, acknowledging that non-formal expressions can happen in different territories.

I completely understand this point of view. But Romansh doesn’t fit in this description. We have different grammar, different ortograph, different ways of saying, and a different formal expression for each of the 5 written forms, and then of course within every of the 5 written forms, we have non-formal expressions as well, as dialects usually have.

I can’t repeat enough, how important this project is for the future of Romansh, in all its written forms. Thank you for reconsidering our enquiry.

Hi,

It seems our list of languages and what you are explaining here are in conflict, and we might need to do some additional research.

I would like to clarify the expectations, we currently don’t have any bandwidth to do additional investigation for languages that don’t fit in this list. This might be something we could see if we can find time to check during 2020, but I can’t to make any promises at this point.

Thanks for your understanding.

Hi,

I do now understand. Your list obviously contains errors. I’ll ask to correct the information via bug report. Romansh is a national language of Switzerland, and an official language in dealings with persons who speak this language. Furthermore: Persons who speak Romansh may address the federal authorities in its idioms or in Rumantsch Grischun. The authorities answer in Rumantsch Grischun.

https://www.admin.ch/opc/en/classified-compilation/19995395/index.html#a4

The National Languages are German, French, Italian, and Romansh.

Art. 5 Official languages

The official languages of the Confederation are German, French and Italian. Romansh is an official language in dealings with persons who speak this language.

Art. 6 Choice of language

What is an idiom in our context:

DeepL translation of the German text:
An idiom of the Grisons Romansh (Rhaeto-Romanic) language spoken in the Swiss canton of Graubünden is a language variety standardized as a written language, which represents a standardized spelling for each regionally connected group of the spoken dialects varying from municipality to municipality.

Exactly what I said, just in better words. :wink:
I let you know as soon as the errors on the list have been corrected.

Kinds regards

Hi
I refer to the list you were linking to. I thought I would need to add the written variations of Romansh to this list, but found, that subtags were already made in 2010.
http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
If you look for Romansh, you’ll find 13 entries, and as written forms, the following 6:

grafik

Here you’ll find some more information to the topic:
https://www.iana.org/
https://www.w3.org/International/articles/language-tags/

With this provided information, I kindly ask you to open common voice for:

rumantsch puter
rumantsch sutsilvan
rumantsch surmiran

Thanks for your effort.

Thanks for the links. As I previously commented, we are relying on this unicode standard list.

Unfortunately we don’t have the bandwidth right now to change this, we have been working for months on the new accents and language strategy and we won’t be able to advance on this field until that’s resolved and implemented (hopefully mid-end this year).

Note this strategy includes a way to have Romansh as a dataset language and capture the different Romansh sound variations, which will allow us to capture people speaking in all the variations you mentioned.

Thanks for your understanding.

Hi. I appreciate your work and effort. So, thanks very much anyway.
I still disagree, that Romansh fits in your new accents and language strategy, but I’m happy to wait, and there will be enough work for us to do with vallader and sursilvan until then. But I got it: I’ll have to do something about the unicode standard list, otherwise we can’t move forwards on this field.
My best,
Conradin