Dialects or Language Varaiants

I am contributing Burushaski language data to the Common Voice project, and I want to ensure fair representation of all its dialects. As a speaker of the Hunza Burushaski dialect, most of the data I have submitted so far reflects only my variety of the language. However, Burushaski has four other dialects that are currently not represented in the project.

I am using a writing system that has been agreed upon by speakers of all dialects, which makes it possible to include data from other varieties as well. To achieve this, I aim to encourage poets and writers from other dialect groups to participate and contribute their voices and texts.
Could you provide me with guidance on how the Common Voice project handles such situations to ensure inclusivity? Also, how can I effectively engage speakers of other Burushaski dialects and convince them to contribute?

Hey @Karim_Piar, Common Voice has support for Variants which support sentence variants (e.g. different scripts and/or effects of other languages due to geographic region, etc.) and speech style (dialect). There is also “Accent” support, which can be pre-defined AND free-form.

Those should be worked on and included to the system through a PR on GitHub thou. That includes BCP-47 language coding, probably you need to check it. If you have linguists in the community and/or you can reach one from an university, that would be best. Especially the distribution of language speakers among countries (India?) can be important.

Here is what I did for Circassian languages - a rather complex one due to diaspora and transliteration. Check this GitHub PR. And this is for accents.

When these are defined, they will be visible in forms. E.g.

  • Profile form will show variants and a a preset list of accents (if also defined - and it will be still freeform for cases like foreign speaker accents)
  • Sentence addition workflows will include variant selection (to support script changes like Arabic/Latin/Cyrillic, written form/meaning changes due to dialect, and/or loanwords due to being a non-national language for example).