Sentence collector copyright issues

sinumade · September 25, 2020, 4:47pm

Japanese language collector have the following problems:

“username”: “navta”, “source”: “http://www.edrdg.org/wiki/index.php/Tanaka_Corpus”
1. “sentence”: “あきらめたら、そこで試合終了ですよ。”
  - From SLAM DUNK.
  - Ref: あきらめたら、そこで試合終了ですよ。 - Google 検索
2. “sentence”: “我が生涯に一片の悔いなし。”
  - From 北斗の拳.
  - Ref: 我が生涯に一片の悔いなし。 - Google 検索
3. “sentence”: “僕は新世界の神となる。”
  - From DEATH NOTE.
  - Ref: 僕は新世界の神となる。 - Google 検索
4. “sentence”: “あんたらの名前なんか興味ないね。どうせこの仕事が終わるとお別れだ。”
  - From ファイナルファンタジーVII.
  - Ref: あんたたちの名前なんか興味ないね。 - Google 検索
  - There are a few changes.

Perhaps this is a problem with the corpus.

I went to the source page and checked the "Public Domain version" and it contains the above text. These sources are famous cartoons and games, and they are obviously not in the public domain. The "Public Domain version" file has a [Manga] flag, but some of the sentences are not. Honestly, I can't determine how much of the offending text is in the mix.

Topic		Replies	Views
Polish sentences concerns Common Voice sentence-collection , issue , dataset	20	3291	May 4, 2020
Extending our sentence collection capabilities Common Voice sentence-collection , announcements	19	3705	September 11, 2019
Sentence collection for Belarusian – request for advice Common Voice sentence-collection	16	1155	July 9, 2021
We want your feedback: Improving the sentence collection Common Voice sentence-collection , feedback	39	8891	January 9, 2019
Problems finding public domain sentences Common Voice sentence-collection	26	2990	June 10, 2019

Sentence collector copyright issues

Related topics