I need to request to add my language Burushaski

Dear All,

I am following 📖 Readme: How to see my language on Common Voice to add a new language. The language is Burushaski, see

The main problem is that Burushaski is spoken only language and therefore script doesn’t exist, officially.

What I need is to create a dataset of voices and later use that to generate auto text.

Please help!

Hello and welcome,

As indicated in the read me, this is the list of languages and scripts we follow for this project.

The way Common Voice works is by displaying text for people to read, and the combination of text and audio pairs create a dataset to be used to train Speech to Text technology.

If this language doesn’t have a written form, I don’t know how we can be helpful.


It looks like the Burushaski Language Documentation Project (http://burushaskilanguage.com) included some work on standardizing a written form. They also mention creating pedagogical materials for learning their writing system, and 20 hours of audio with text available here: https://digital.library.unt.edu/explore/collections/BURUS/

If there is a community of Burushaski speakers who are interested in developing and using a speech recognition system, it seems like the first step would be for them to decide on a writing system, and learn to read and write in it. Without anyone able to read the language, how could the output of a speech recognizer be of any use?

Although there would be many challenges, I think there is a path forward for including this language. I wonder if the materials available from the project above would be enough for your community to get started with using the writing system?


Thank you Craig.
Unfortunately, our language has no writing system. Our community of Burushaski speakers are almost done with finalizing a writing system, including Burushaski keyboard support.
The material available in the link can be enough to start. The first step is to start writing in our language and many of our community members can write in it.

I am not sure how to proceed.

As described in the Readme, I think the next steps would be having the language added in Pontoon, so that you can begin translating the user interface; and beginning to collect sentences in the Sentence collector. As members of your community begin writing, they could consider licensing their writings as CC-0 when possible and contributing them to the sentence collector.

You might also consider setting up a Wikipedia. A few sentences can be collected from Wikipedia for use in this project, although the Wikipedia license is generally not compatible.

This is just a list of ISO 639-3 language codes, each paired with a script name. Although it is not on this list, Burushaski does have a 639-3 code: bsk (see [here](http://www.endangeredlanguages.com/lang/1614 for example), and the script proposed in the Burushaski Language Documentation Project is Latin (the code “Latn” is used in nukeador’s list). It looks like the script is composed of the ASCII letters, plus vowels with macron (āīēōū), including some digraphs and trigraphs. @amjadsaleem is this the script your community is using?