This thread serves for reporting copyright issues arisen with sentences submitted to the sentence collector tool. Please report any sentences that you can find came from a source licensed in any other way than CC-0 (public domain works) as a reply in here. When reporting, please supply the “source”…

Sources used in polish collection which do not fall into CC0 category: “source”: “From the book.”, “username”: “narid” – seems to be actually book “Biały Kieł/White Fang” not CC0 AFAIK. First polish translation was done by Anna Trzeciakowska https://pl.wikipedia.org/wiki/Anna_Trzeciakowska and was…

These Georgian sentences are not under the public domain: “username”: “rigormortis”, “source”: “ https://ka.wikibooks.org *”. “username”: “Geor”, “source”: “Own work” – A movie scripts, without the CC0 license. “username”: “rigormortis”, “source”: “ https://ka.wikiquote.org *”. Also, please remove th…

[image] G12r: Also, please remove the approved sentences with “invalid” flags. Most of them have typos. Can you elaborate a bit more here? I’m a bit hesitant to just remove anything that ever got one invalid vote.

[image] mkohler: Can you elaborate a bit more here? I’m a bit hesitant to just remove anything that ever got one invalid vote. Then just remove those marked as invalid by Razmik, he found many mistakes. Thanks!

Thanks [image] G12r: “username”: “rigormortis”, “source”: “ https://ka.wikibooks.org *”. “username”: “Geor”, “source”: “Own work” – A movie scripts, without the CC0 license. “username”: “rigormortis”, “source”: “ https://ka.wikiquote.org *”. [image] G12r: Then just remove those marked as in…

The Polish review tab is currently filled with segments from Lord of The Rings, which is very much not public domain. Didn’t even bother slicing it into sentences… Username is narid, source is from the book. (again!).

[image] Sobsz: The Polish review tab is currently filled with segments from Lord of The Rings, which is very much not public domain. Didn’t even bother slicing it into sentences… Username is narid , source is from the book. (again!). Thanks for reporting this. These have been removed.

Japanese language collector have the following problems: “username”: “navta”, “source”: “ http://www.edrdg.org/wiki/index.php/Tanaka_Corpus ” “sentence”: “あきらめたら、そこで試合終了ですよ。” From SLAM DUNK . Ref: あきらめたら、そこで試合終了ですよ。 - Google 検索 “sentence”: “我が生涯に一片の悔いなし。” From 北斗の拳 . Ref: 我が生涯…

@sinumade thanks for reporting. I’m not a lawyer, so I can’t really answer that. @mbranson @jscowcroft any advise here?

Sentence collector copyright issues

Common Voice

mkohler (Michael Kohler) March 17, 2020, 11:10pm 6

Can you elaborate a bit more here? I’m a bit hesitant to just remove anything that ever got one invalid vote.

Topic		Replies	Views
Polish sentences concerns Common Voice sentence-collection , issue , dataset	20	3357	May 4, 2020
Extending our sentence collection capabilities Common Voice sentence-collection , announcements	19	3763	September 11, 2019
Sentence collection for Belarusian – request for advice Common Voice sentence-collection	16	1199	July 9, 2021
We want your feedback: Improving the sentence collection Common Voice sentence-collection , feedback	34	8983	December 17, 2018
Problems finding public domain sentences Common Voice sentence-collection	26	3078	June 10, 2019

Sentence collector copyright issues

Related topics