Volunteer to help to add Sanskrit and Kannada languages in the Common Voice project

Dear Partners,

I am a Professor of Electrical Engineering at the Indian Institute of Science, Bangalore. We are working on ASR for Sanskrit and Kannada. I see that both of these languages are not there currently as part of the Common Voice initiative. I and my students can help in getting you text in both the languages to get started with the speech data collection.

Kindly let me know how we can start.

Regards and advance thanks!

Ramakrishnan A G

Please check 📖 Readme: How to see my language on Common Voice

To quote:

:open_book: Mozilla Voice Community Playbook: The source of truth for setting up and maintain self-sustainable communities.

Hello everyone,

I would like to open this topic to summarize some of the most asked question we are getting: How do I get my language in Common Voice.

There are three steps to have your language ready:

:globe_with_meridians: Have the website localized over pontoon

If your language is not there yet, please make a new topic with the request on this category indicating the language and the script.

:hammer: Skills needed: English knowledge, strong knowledge of your language.

Reference: Common Voice languages and accent strategy v5

:open_book: Gather a lot of sentences under public domain (CC-0)

:hammer: Skills needed: Command line usage and git, familiar with regular expressions.

:white_check_mark: Submit and review more sentences from other sources (not wikipedia)

To be incorporated into the database using the Sentence Collector tool.

:hammer: Skills needed: Strong grammar knowledge of the target language you are contributing to.

If you have found an existing public domain corpus bigger than 100K sentences, we have an independent process to handle it, since we understand that manual validation using the sentence collector is not ideal.

Please create a new topic here so we can evaluate if your corpus fits the license and size requirements to run this process.

:hammer: Skills needed: Expertise processing and cleaning up text, linguistics/language expertise to check the quality of the resulting sentences.

:next_track_button: Next step

Once you have enough validated and reviewed sentences (usually over 5000), we can enable a language to accept voice recording on the site and you might wonder My language is now collecting voice, what do I need to know?

:warning: Please note you will have to keep adding sentences to be able to allocate more recordings without repetitions.

Feel free to add any questions to this topic and we will be happy to support you :slight_smile: