I need to request to add my language Burushaski

Dear All,

I am following đź“– Readme: How to see my language on Common Voice to add a new language. The language is Burushaski, see

The main problem is that Burushaski is spoken only language and therefore script doesn’t exist, officially.

What I need is to create a dataset of voices and later use that to generate auto text.

Please help!

Hello and welcome,

As indicated in the read me, this is the list of languages and scripts we follow for this project.

The way Common Voice works is by displaying text for people to read, and the combination of text and audio pairs create a dataset to be used to train Speech to Text technology.

If this language doesn’t have a written form, I don’t know how we can be helpful.


It looks like the Burushaski Language Documentation Project (http://burushaskilanguage.com) included some work on standardizing a written form. They also mention creating pedagogical materials for learning their writing system, and 20 hours of audio with text available here: https://digital.library.unt.edu/explore/collections/BURUS/

If there is a community of Burushaski speakers who are interested in developing and using a speech recognition system, it seems like the first step would be for them to decide on a writing system, and learn to read and write in it. Without anyone able to read the language, how could the output of a speech recognizer be of any use?

Although there would be many challenges, I think there is a path forward for including this language. I wonder if the materials available from the project above would be enough for your community to get started with using the writing system?