hey, I think the very specific goal of common voice can’t achieved this way, I quote “Common Voice is Mozilla’s initiative to help teach machines how real people speak.” we don’t speak the formal/classical arabic, every region has it’s distinctive version of arabic that differ a lot, the reason why the classical arabic is the formal is really complicated and i think it’s related to religious concerns, muslims believe that they have to preserve the classical arabic because it’s the language of their holy book (qur’an). the arabic dialects are “how real people speak”.
Hi Ahmed, people discussed this several times, for example here:
It is always possible to add a new language or dialect to Common Voice, but be aware that this means a lot of work and you have to be able to collect many sentences under a public domain licence in that dialect.
I once wanted to add a German dialect, but it turned out to be very difficult because there is no standardized form and not enough sentences available. If your variation of Arabic does have a separate version of Wikipedia your chances are good that you will be able to start it.
EDIT: looks like Egyptian Arabic is the only Arabic dialect that has its own Wikipedia: https://arz.wikipedia.org/wiki/مصريين
that’s good, I am from egypt already
can you tell me please who to add a new language?
Here is a post about the process:
The basic process is:
- Someone from the Commen Voice team has to add the new language code to all tools (is it arz or something like ar_egypt?) @nukeador can you help here?
- Translate the website
- Collect sentences (Wikipedia extraction often works best, but you can also collect sentences manually)