Sentence collector for Japanese language (日本語の文章について)

On reading Kanji characters

日本語版: 漢字の讀み方について

The way a number is read depends on context and might introduce confusion in the dataset.

I've always wondered about this sentence from How to, too.
The way a kanji is read depends on context! And most kanji have two or more readings on their own.

I'll list as many as I can think of.

A. same meaning / same character / different reading

I think this is what @Adrijaned is concerned about:

  • (Rei / Zero / Maru) meaning "Zero". Maru is a limited reading.
  • (Shi / Yon) meaning "Four". Both are major readings.
  • (Shichi / Nana) meaning "Seven". Both are major readings, too.
  • 明日 (Ashita / Asu / Myōnichi) meaning "tomorrow". Asu and Myōnichi are a bit formal.
  • 昨日 (Kinō / Sakujitsu) meaning "yesterday". Sakujitsu is a bit formal, too.
  • 重複 (Chōfuku / Jūfuku) meaning "duplicate". Is there more people who read Jūfuku?
  • 経緯 (Keii / Ikisatsu) meaning "circumstance". Is there more people who read Keii?
  • 世論 (Seron / Seiron / Yoron) meaning "public opinion". I'm sure most people don't know about Seiron. It is generally read as Yoron.

Certainly, the context can narrow down the reading to some extent. But it's a "trend", not an absolute. How a speaker reads depends on their knowledge and lifestyle (e.g. occupation, amount of reading, etc.). Or, more to the point, it can be a matter of "preference". Therefore, when we are asked to read something correctly, we are perplexed. "They are all correct, aren't they?"

The speech algorithm needs to know how to read everything.

B. same meaning / different character / same reading

It is used differently depending on the meaning of each character. Or preference.

  • 暗黒An-Koku / 闇黒An-Koku
  • 日差Hi-Zaし / 陽射Hi-Za

C. different meaning / same character / different reading

The reading depends on the context and the word.

  • 小人Ko-Bito / 小人Ko-Domo
  • 最中Sai-Chū / 最中Mo-Naka
  • 落着Raku-Chaku / OTsu
  • 過去Ka-Ko / SuSa
  • Akaるい / Kuraい / 明暗Mei-An

Example

  • ここは人気があります。
    • ここは人気Nin-Kiがあります。(This place is popular.)
    • ここは人気Hito-Keがあります。(There are signs of people here.)

Yes, it's impossible to determine how to read in this short context.

D. different meaning / different character / same reading

So-called 同音異義語Dōon-Igi-Go (meaning "homonyms").

  • けんとうKen-Tō: 見当 / 拳闘 / 軒灯 / 健闘 / 検討 / 賢答 and more.
  • せいかくSei-Kaku: 正確 / 性格 / 正格 / 精確 / 醒覚 and more.
  • いしI-Shi: 石 / 意志 / 医師 / 遺志 / 遺子 and more.
  • かなうKana-U: 適う / 叶う / 敵う

Example 1

  • きじKijiniかけてkakateいるiruぶぶんbubungaあるaru
    • 記事に書けている部分がある。(There are parts of the article that could be written about.)
    • 記事に欠けている部分がある。(There is a part of the article that is missing.)
    • 生地に欠けている部分がある。(There is a part of the fabric that is missing.)
    • 生地に掛けている部分がある。(There is a part of the fabric that the fabric.)
    • Um, more?

All Japanese pronunciations can be written in hiragana, but here's why they shouldn't be. Of course, there is a difference in intonation between 書けて and 欠けて. But 記事 and 生地 are the same. If we're trying to figure out the meaning from a hiragana sentence, we're going to need more "background".

Example 2

  • ここではきものをぬぎます。
    • ここKokode履き物hakimonowo脱ぎnugiますmasu(This is where you take off your footwear.)
    • ここKokoではdewa着物kimonowo脱ぎnugiますmasu(This is where you take off your kimono.)

It's a common pun. Like "Ice Cream" and "I Scream"? It's pronounced a little differently, though.