Errors occur when Traditional Chinese Characters are converted to Simplified Chinese Characters 繁体-简体转换存在错误

Hello,
I read Simplified Chinese.
But I come across a lot of wrongly converted Simplified characters that prevent me from reading it. And they are completely wrong.

For example, it should be “著名”, not “着名”, which “著” is wrongly converted, it should be “著”.
Another one is: “乾隆”, not “干隆”, which “乾” should not be converted to “干” at all.

I’ve reported them as Grammatical / spelling error, but they haven’t been corrected till now.

I think it will be better if the source sentences are fetched directly from Simplified Chinese sources. A machine converting may sometimes generate mistakes like those above.

1 Like

Hi,

Welcome to the community!

What’s your estimation on the percentage of sentences with issues? Currently zh-CN has a limited number of sentences exported from zh-CN wikipedia, and we estimated at the time after some quality controls that the percentage was low enough to be OK.

Thanks!

Hi,
Thank you for your reply.
According to two days reading, I estimate that it would be about 4%-6% or even higher.

If this[1] is the sentence file, then it has totally 288 lines of the mistakes mentioned above. Not a big percentage compared to the whole 49981 lines. But it’s high enough to not be ignored.

[1] https://github.com/mozilla/voice-web/blob/master/server/data/zh-CN/wiki.zh-cn.txt