Allow copyrighted text with a take down notice

daniel.abzakh · August 26, 2020, 9:33am

Hello,
My understanding is that Common Voice right now is in maintenance mode, but I am suggesting this feature which will be beneficial specially for low resource languages, if we could allow copyrighted text with a takedown notice.
I could work on this feature, and submit a pull request.

I have 1.2 million sentences in Abkhazian and possibly other languages, but I can’t use it because it’s copyrighted, we can have a takedown notice so the contributor will understand the risk, they will have the choice to either contribute to this copyrighted data set or the more reliable CC0 data set.

Regards,
Nart.

Adrijaned · August 26, 2020, 2:57pm

I understand where you are aiming with this, but to me this unfortunately sounds an awful lot like suggesting to intentionally break a law, and hope no one complaints.

daniel.abzakh · August 26, 2020, 4:07pm

Definitely, we don’t want to get anyone upset! If the owner doesn’t want their text to be used for research and non-commercial use, we should protect their privacy, and the text will be taken down and related voice records deleted.

Adrijaned · August 26, 2020, 4:45pm

Have you thought about trying to contact the authors of your texts in advance and securing their permission first? If they agree to release their texts in some form for this projects, there will be no further problem including them, and if they disagree, still better than if you included it first without their permission, and then got sued for copyright infringement.

daniel.abzakh · August 26, 2020, 5:03pm

You are right!
That’s a clean solution, so they would probably allow the usage of the text under some terms.
The next step is how can we include all this text to Common Voice?

lissyx · August 26, 2020, 5:27pm

Just make sure you can get this released as CC-0, importing is not a big issue.

stergro · August 26, 2020, 6:53pm

I did this for Esperanto, I asked blogs and web magazines and most of them were happy to donate sentences to the project.

I always make clear that I will only use sentences with fewer than 15? words and that the dataset will be released as CC0. So there is no recognizable text only a list of sentences that they give away for free.

Adrijaned · August 26, 2020, 7:01pm

Depending on the size and quality of the individual sources. If there is one big source or potentially several smaller with highly comparable quality, you could just extract them all from that source into one file, one sentence per line, and submit it in a pull request to the common voice repository. Then you would have to get preferably at least two of three people to do quality assurance of those sentences, if you ask how in the PR someone will definitely gladly guide you. If the sources you have in mind are mostly smaller individual works, e.g. articles from some blogs, or, depending, even individual books, I’m afraid you will just have to import them into the sentence collector, and pass them through the normal process.

daniel.abzakh · August 26, 2020, 8:17pm

Did you get a handwritten release form for these sentences? I did that for the first 5000 sentences that I have submitted here.

So there is no recognizable text only a list of sentences that they give away for free.

Are you sure of this?! Common Voice could barely collect 3 sentences per article from Wikipedia due to copyright limitations.

stergro · August 26, 2020, 9:56pm

Well I saw how much sentences disappeared after I filtered them by legth, foreign letters, structure and so on. Often I could only use around a third of a text. But this wasn’t a legal argument, just an argument to take away some fears of some authors. Most authors care about their texts, not so much about a alphabeical list of sentences from their texts. I just had the feeling that this argument helps to get the permission.

daniel.abzakh · August 27, 2020, 8:39am

Have you tried to get them to sign a release form?Here is a link to a form that I have used previously.

Copyright waive form for sentence collection Common Voice

Would this copyright waive form for sentence collection be acceptable: Я, _________________________________________, являющийся автором и правообладателем созданного моим творческим трудом произведения, идентифицированного в нижеприведенной таблице, настоящим выражаю согласие и предоставляю на безвозмездных (бесплатных) условиях для проекта Common Voice и проекта Mozilla, а также для иных лиц (общество, мировое сообщество, публика) право использовать мое произведение для публичного распростран…

stergro · September 19, 2020, 12:57pm

No I just saved the mail where they confirmed their consent. This might be a little risky, but it was enough for me.

Topic		Replies	Views
Copyright waive form for sentence collection Common Voice sentence-collection	11	1008	May 23, 2019
Licensing and contribution to Common Voice Common Voice sentence-collection	5	1653	June 12, 2019
📖 Readme: How to see my language on Common Voice Common Voice announcements	40	14347	May 10, 2022
Common voice sentences are the opposite of "common" Common Voice participation , sentence-collection , feedback , issue	27	3852	September 7, 2024
Extending our sentence collection capabilities Common Voice sentence-collection , announcements	19	3735	September 11, 2019

Allow copyrighted text with a take down notice

Related topics