Add tamazigh language

l10n
(Romeo Kienzler) #1

I’m working with people in Morocco to get Amazigh contributions, any change to add it to the list in pontoon?

📖 Readme: How to see my language on Common Voice
(Rubén Martín) #2

I’m not a language expert, but as long as languages (and not accents within a language) are requested on pontoon, there should be no problem.

Just flagging because according to wikipedia Tamazigh is a set of different Berber languages (so maybe individual ones should be requested).

1 Like
(Romeo Kienzler) #3

I’ve spoken to some native Amazigh people today in Morocco and they’ve told me that Amazigh is the language and the rests are dialects. But they’ve also told me that some don’t understand each other, so a bit hard to tell. But I guess (because we anyway won’t be able to collect a very large corpus) it might make sense to define Amazigh as language and the reset as dialects.

E.g. finally, DeepSpeech can be selectively trained on the language or on dialects only, correct?

(Rubén Martín) #4

Before initiating any effort it would be wise to get the opinion of a linguistic about this. We should make sure we enable recognized languages with their official locale code.

@jbeatty Is there a resource that can help us here to take a decision?

1 Like
(Lissyx) #5

You can train on whatever you inject it

(Taqbaylitassa) #6

@ romeokienzler
Welcome.

I’m from Kabylia. I’m involved with Mozilla localization since years and now with Common Voice. We are a team of Kabyle localizers and activists who are working on the kab locale.

As I know, Morrocan Standard Amazigh Language is not yet fully standardized. We discussed about it last January in Tangers - Morroco with language activists from Morroco.

I recommand you to launch separately the corpus of the three major Berber languages Tarifit, Tacelḥit and tamaziɣt. Common Voice deals with natural.human language.

I recommand you also to use the latin Script as Kabyle. Please get in touch with people involved with linguistics studies in Morroco to know more about the deal.

Here are the language codes:
Tachelḥit (shi) : https://iso639-3.sil.org/code/shi
Tarifit (rif) : https://iso639-3.sil.org/code/rif
Tamaziɣt (tzm) n waṭlas alemmas: https://iso639-3.sil.org/code/tzm

If you are only interested with Common Voice, I think you have to choose one of these codes. You should not gather different language structures and phonetics within one Corpora, that’s why I’m asking you te get in touch with students/researchers dealing with Amazigh languages.


3 Likes
(Rubén Martín) #7

Thanks for the information @taqbaylitassa, really informative :slight_smile:

1 Like
(Taqbaylitassa) #8

@nukeador
By the way I’m Belkacem Mohammed, the admin of the kab locale on Pontoon. :smile: I lost the last access :wink:

1 Like
(Romeo Kienzler) #9

thanks a lot @taqbaylitassa let me get 5K sentences in at least one dialect and come back to you, good point with the latin script since most of the contacts we’ve found can’t write or read the native script anyway…

1 Like
(Taqbaylitassa) #10

In which variant? I have contacts from both Tachelḥit, Tarifit and Tamaziɣt speakers from Morroco which can help to validate sentences. Please would ask your contacts to see the kabyle localization of Common Voice? They could a lot of things to share with some of them, Tacelḥit for example.

(Slimane Selyan AMIRI) #11

Thanks for the information, actually Tamazight is a set of very different languages, each language has its own ISO code, each language has its morphosyntactic structure.

(Slimane Selyan AMIRI) #12

I’m also part of the Kabyle team who localize on pontoon and I also contribute on common voice Kabyle. You can download the Latin Kabyle keyboard from our website. Imsidag.com/anasiw-aqbayli