New Zealand parliament corpus

What about the New Zealand parliament hansards?
It has some great lines like “Ok Boomer”. But it is possibly a more useful source if there is a NZ Maori dataset started.

The website says:

We encourage you to make use of the content on Parliament’s website. To do this, we license content that we own the copyright to under the most open Creative Commons licence available. This licence is called Creative Commons Attribution, also known as CC-BY. The terms of this licence are set out below.

and

Content not covered by the Creative Commons licence
Public domain content
Some content is not covered by copyright. You are free to re-use this kind of content without a licence. This includes:

  • Government bills
  • Parliamentary debates (Hansard)
  • Reports of select committees

Looks like Australian parliamentary documents are CC3.0. (NSW says you can’t use their documents to satire them… what a self confident parliament)

The Canadian parliments has its debates available in machine readable format and call them “open data” and their glossary says open data is:

Structured data that is machine-readable, freely shared, used and built on without restrictions.

We will add this question to our next legal meeting, since we have been asked to consult on individual corpus basis.

Cheers.

1 Like

After consultation with our legal team:

We can’t put the CC-BY data in Common Voice but:

    Content not covered by the Creative Commons licence - Public domain content

        You are free to re-use this kind of content without a licence. This includes:

            Government bills

            Parliamentary debates (Hansard)

            Reports of select committee

We can use those three categories that are public domain (but we should be careful not to stray outside these categories into CC-BY material). The bills are probably not very good as examples of natural language but Hansard might be useful.

1 Like