I brought the idea of using this corpus, and I still believe we can use it. Here is why I think is true.
UN documents is public domain, we can all agree on that.
With respect to database and copyright rights, the whole corpus is protected by both. There is copyright in marking parallel corpus, this is work protected by copyright, but we don’t use it as parallel corpus. The mere job to OCR UN documents can’t create additional copyright, as it doesn’t have threshold of originality (e.g. it’s a mechanical produced work with some proof-reading, but proof-reading alone can’t create additional copyright ). As for sui generis database rights employed in EU, our use of corpus pass fine with it. The database right on this corpus can be employed only if we reproduce parallel properties of it, as just mere collection of UN documents doesn’t create additional database rights as it just a reproduction on UN database (e.g. for database rights there is need to be independent database sources).
Even if this all enough that we rule our usage of that corpus comply with PD requirements as we just use PD UN documents and not an derivative work of UN documents as such doesn’t have required threshold of originality, the terms on corpus website seems pretty permissive, there is no even requirement to provide a copy of it or provide attribution to the UN (which we do with source field, and this is not required anyway as it’s PD):
The following disclaimer, an integral part of the United Nations Parallel Corpus, shall be respected with regard to the Corpus (no other restrictions apply):
- The United Nations Parallel Corpus is made available without warranty of any kind, explicit or implied. The United Nations specifically makes no warranties or representations as to the accuracy or completeness of the information contained in the United Nations Corpus.
- Under no circumstances shall the United Nations be liable for any loss, liability, injury or damage incurred or suffered that is claimed to have resulted from the use of the United Nations Corpus. The use of the United Nations Corpus is at the user’s sole risk. The user specifically acknowledges and agrees that the United Nations is not liable for the conduct of any user. If the user is dissatisfied with any of the material provided in the United Nations Corpus, the user’s sole and exclusive remedy is to discontinue using the United Nations Corpus.
- When using the United Nations Corpus, the user must acknowledge the United Nations as the source of the information. For references, please cite this reference: Ziemski, M., Junczys-Dowmunt, M., and Pouliquen, B., (2016), The United Nations Parallel Corpus, Language Resources and Evaluation (LREC’16), Portorož, Slovenia, May 2016.
- Nothing herein shall constitute or be considered to be a limitation upon or waiver, express or implied, of the privileges and immunities of the United Nations, which are specifically reserved.
So there is absolutely no reason why we can’t use it.