Add Quran text as a new language

Quran is the most popular book in the world and read daily by millions muslims.Cause of its special accent (Tajweed) it should be laballed as a new language.I promise if you add it your speech data grows fastly.

In general we recommend not to add sentences from old books because they contain expressions that are no longer used in modern languages.

In terms of languages, we are just adding full languages (in this case would be Arabic) and then you can select your accent on your profile.

It would be good to get an Arabic expert here to let us know how useful to recognize modern spoken Arabic would be to have this accent.


There are some researches about this but I am not a speech recognization professional.

@myaccount is right in that Quranic Arabic is special. It is widely seen as a different language.

@nukeador - we cannot treat Quranic Arabic (or any other “dialect” of Arabic) as different accents.

We need at least (1) Modern Standard Arabic (aka - Fusha), (2) Quranic Arabic (aka - Classical Arabic), and a separate entry for other dialects (e.g. Levantine Arabic, Egyptian, etc.)

These are really different languages, with different writing norms, grammars, and pronunciation.



I’m also an arabic speaker. There is no Quran arabic. There is Standard arabic used in teaching and Mosques. Popular arabics (plural) are used in real and daily communication.

I think we can’t create a copus for each religious book in the world. It’s nonsense. There is an ongoing project to create a corpus for standard arabic. The best is to use sentences from the real language and follow the rules defined here:


@belkacem77 Yes there is no Quran arabic and it is standard arabic.

Quran is not “each religious book”. It is special religious book and over the world it was read daily thousands of hours.It is a miracle that a book after about 1400 years was read this much.

The Aim of Common voice project is to create a database to make better voice recognition, Adding Quran text as new language will not help for this cause. The research paper cited are not relevant to Common voice project.

I do not think any text should be added to the Common Voice dataset which does not resemble a language spoken in real world in present day- whether it is a religious text or not.