Sentence collector copyright issues

This thread serves for reporting copyright issues arisen with sentences submitted to the sentence collector tool. Please report any sentences that you can find came from a source licensed in any other way than CC-0 (public domain works) as a reply in here.

When reporting, please supply at least:

  • Name of the person who submitted the sentences.
  • What was submitted as “source” for the sentences.

Optionally, you can submit also a link to the actual text the sentences were copied from.

How do I find the required information?

Currently, the easiest approach is to visit the following URL:
https://kinto.mozvoice.org/v1/buckets/App/collections/Sentences_Meta_<languageCode>/records, replacing <languageCode> with the two-letter code of the language the sentences were submitted to. For example, https://kinto.mozvoice.org/v1/buckets/App/collections/Sentences_Meta_en/records or https://kinto.mozvoice.org/v1/buckets/App/collections/Sentences_Meta_cs/records. On that address, you should be presented with a JSON data of the sentences in the collection tool. Search in there for one of the sentences you suspect to be submitted against our copyright requirements, and you are interested for the author and source fields of that sentence then. For example, in Firefox, when you are in the JSON view (selected using the bars at the top, also should be default after loading the page), expand the data array (by clicking on the little triangle in front of it). Then, type long enough part of the sentence that you can remember into the filter field just bellow the tabs at the top of the page. With a long enough part of the sentece types, you should see just one number bellow the “data” bellow. Remember that number, then delete everything in the filter box again. Scroll down until you find the number that you remembered, then click on the little triangle next to it, and copy here what you find on the lines following the words username and source.

If you are for any reason unable to do all that, you can also just copy & paste a few of the sentences you suspect break our copyright policy in here and we will also manage :slight_smile:

3 Likes

Sources used in polish collection which do not fall into CC0 category:

This is taken care of.

These Georgian sentences are not under the public domain:

  • “username”: “rigormortis”, “source”: “https://ka.wikibooks.org*”.
  • “username”: “Geor”, “source”: “Own work” – A movie scripts, without the CC0 license.
  • “username”: “rigormortis”, “source”: “https://ka.wikiquote.org*”.

Also, please remove the approved sentences with “invalid” flags. Most of them have typos.

Can you elaborate a bit more here? I’m a bit hesitant to just remove anything that ever got one invalid vote.

Then just remove those marked as invalid by Razmik, he found many mistakes.

Thanks!

Thanks

This is taken care of.

1 Like

The Polish review tab is currently filled with segments from Lord of The Rings, which is very much not public domain. Didn’t even bother slicing it into sentences… Username is narid, source is from the book. (again!).

Thanks for reporting this. These have been removed.

1 Like