Getting a Language (Korean) to Be Speaking/Listening Ready

Chiarella · November 2, 2022, 5:35pm

Hello. For the Korean language, there are over 5000 sentences collected and the site is over 90% localized (actually 100% last I checked). What is left to be done before one can contribute?

Go to the sentence collector and the statistics (actually, the parameters, as stats are just attributes of a sample and not the whole thing) say that the language has over 5000 validated. The page https://commonvoice.mozilla.org/en/languages lists 100% localization but under 4000 sentences collected.

What’s up?

I’d really like this to get up and running and generate more interest as Korea has many people online but not in the CC0/open source/free software world.

We need Koreans to help generate spoken sentences in the collector, too. as much of the public domain stuff is written material from the 1930s or the translated version of the Christian Bible that is intentionally in archaic speech to mimic the King James Bible. (Yes, as in translated from the English, not the source Hebrew, Aramaic, and Greek. I know, I know.) I intentionally “skip” those sentences on review. Those sentences are … not necessarily wrong, but I don’t know enough to tell whether “ye people” or “you people” is correct, for example, and the sentences are next to useless for training an AI. I rejected a fair number of sentences obviously contributed by native Anglophones living in Korea or something because they are grammatically broken or are “translation-ese.”

Basically, we need to do everything to can to get Korean Koreans interested and able to take part and get this going.

bozden · November 2, 2022, 6:59pm

Hi @Chiarella, good news

AFAIK, Sentence Collector exports the sentences weekly and the Common Voice import them about bi-weekly, these are automated processes.

generate more interest

There is one Telegram group for cv-Korean, you can find the link here:

github.com/common-voice/common-voice

docs/COMMUNITIES.md

main

# Community Participation Guidelines

The Mozilla Project welcomes contributions from everyone who shares our goals and wants to contribute in a healthy and constructive manner within our community. As such, we have adopted this code of conduct and require all those who participate to agree and adhere to these Community Participation Guidelines in order to help us create a safe and positive community experience for all. Please read the community participation guidelines on [https://www.mozilla.org/about/governance/policies/participation/](https://www.mozilla.org/about/governance/policies/participation/)

## Why was the list created?

Many language communities are self organising and have their contact channels on diverse systems. It would be cool to keep a list of them so that when someone wants to get in contact they know where to go. Here are a couple to start with:

## Channels

* General:
  * Common Voice on [Matrix](https://app.element.io/#/room/#common-voice:mozilla.org) [official]
  * Common Voice on [Discourse](https://discourse.mozilla.org/t/about-common-voice-readme-first/17218) [Offical] learn more about how you can request a language specific sub-discourse thread on our readme.
  * Common Voice on [Telegram](https://t.me/mozilla_common_voice)
* Bashqort (`ba`):
  * [Telegram](https://t.me/bashkort_voice)
* Belarusian (`be`):
  * [Website](https://mova.pro)
  * [Telegram](https://t.me/voice_by)
* Bengali (`bn`):

This file has been truncated. show original

Chiarella · November 2, 2022, 8:32pm

Thank you for the explanation and the Telegram link!

Topic		Replies	Views
Common Voice Sentence Collection Tool launch Common Voice sentence-collection , announcements	14	4319	March 27, 2019
📖 Readme: How to see my language on Common Voice Common Voice announcements	35	14431	May 10, 2022
Hungarian language Common Voice sentence-collection	10	1622	July 27, 2020
We need a Q&A Common Voice feedback	5	2223	October 2, 2020
Common Voice 100 hours sprint Common Voice participation , announcements	7	881	April 14, 2019

Getting a Language (Korean) to Be Speaking/Listening Ready

Related topics