Secretly Public Domain

Books published prior to 1964 had to manually renew their copyright and apparently 80% didn’t bother and are therefore in the public domain.

Secretly Public Domain is a project to determine which pre-1964 books are in the public domain.

https://www.crummy.com/2019/07/22/0

I assume this only applies to US and not worldwide?

I would assume so, yes.

@dabinat Would the Bible be useful? I don’t know about its license, but it has a lot of sentences, someone might argue that is has old text or it is too formal, but I think that a good language model could solve that. I found this: http://christos-c.com/bible/

@Codigo_Logo_Programacao_e_Inteligencia_Artificial I remember this came up before about other religious books: Add Quran text as a new language

IMO the style is too different to how modern people speak.

2 Likes

Do you have an update on OpenSubtitles?

I’m not a lawyer so I don’t know the answer to the question of who owns the copyright if you write subtitles for a script that’s public domain.

If the copyright does go to the subtitle author, I don’t have an answer as to whether or not OpenSubtitles contributors are waiving that right by contributing to the site.

However, if the public domain status is confirmed by someone more knowledgeable, I would be more than happy to write a script to import them from an SRT/VTT.

I also was told about this site, which has already scrapped large sources of text, including open subtitles

http://opus.nlpl.eu/

This is probably interesting when we open the conversation with the legal team about fair use of non-wikipedia sites.