Getting a Language (Korean) to Be Speaking/Listening Ready

Hello. For the Korean language, there are over 5000 sentences collected and the site is over 90% localized (actually 100% last I checked). What is left to be done before one can contribute?

Go to the sentence collector and the statistics (actually, the parameters, as stats are just attributes of a sample and not the whole thing) say that the language has over 5000 validated. The page https://commonvoice.mozilla.org/en/languages lists 100% localization but under 4000 sentences collected.

What’s up?

I’d really like this to get up and running and generate more interest as Korea has many people online but not in the CC0/open source/free software world.

We need Koreans to help generate spoken sentences in the collector, too. as much of the public domain stuff is written material from the 1930s or the translated version of the Christian Bible that is intentionally in archaic speech to mimic the King James Bible. (Yes, as in translated from the English, not the source Hebrew, Aramaic, and Greek. I know, I know.) I intentionally “skip” those sentences on review. Those sentences are … not necessarily wrong, but I don’t know enough to tell whether “ye people” or “you people” is correct, for example, and the sentences are next to useless for training an AI. I rejected a fair number of sentences obviously contributed by native Anglophones living in Korea or something because they are grammatically broken or are “translation-ese.”

Basically, we need to do everything to can to get Korean Koreans interested and able to take part and get this going.

Hi @Chiarella, good news :slight_smile:

AFAIK, Sentence Collector exports the sentences weekly and the Common Voice import them about bi-weekly, these are automated processes.

generate more interest

There is one Telegram group for cv-Korean, you can find the link here:

1 Like

Thank you for the explanation and the Telegram link!

1 Like