Japanese language collector have the following problems:
- “username”: “navta”, “source”: “http://www.edrdg.org/wiki/index.php/Tanaka_Corpus”
- “sentence”: “あきらめたら、そこで試合終了ですよ。”
- From SLAM DUNK.
- Ref: あきらめたら、そこで試合終了ですよ。 - Google 検索
- “sentence”: “我が生涯に一片の悔いなし。”
- From 北斗の拳.
- Ref: 我が生涯に一片の悔いなし。 - Google 検索
- “sentence”: “僕は新世界の神となる。”
- From DEATH NOTE.
- Ref: 僕は新世界の神となる。 - Google 検索
- “sentence”: “あんたらの名前なんか興味ないね。どうせこの仕事が終わるとお別れだ。”
- From ファイナルファンタジーVII.
- Ref: あんたたちの名前なんか興味ないね。 - Google 検索
- There are a few changes.
- “sentence”: “あきらめたら、そこで試合終了ですよ。”
Perhaps this is a problem with the corpus.
I went to the source page and checked the "Public Domain version" and it contains the above text. These sources are famous cartoons and games, and they are obviously not in the public domain. The "Public Domain version" file has a [Manga] flag, but some of the sentences are not. Honestly, I can't determine how much of the offending text is in the mix.