Question: Importing Catalan Parliament proceedings

jmontane · May 28, 2020, 6:54pm

Hi, we (Catalan community) are searching text sources under CC0 / Public Domain, and we found Catalan Parliament proceeding. And we have 2 question about this soure:

1st one:
I wonder if we can import phrases of this site to Common Voice, it same way we can import Europarl corpus now.

Catalan Parliament legal note says (translated with Google translate):

The information contained on this website has been included in order to make it easily available for personal or public use and may be reused without the need to obtain the express authorization of the Parliament of Catalonia or pay any price. However, the reuse of the information is limited to the following conditions:

that its meaning is not altered or distorted.
that it is not presented as an official version of the documents nor as a version made with the collaboration or with the endorsement of the Parliament of Catalonia.

that the Parliament of Catalonia be identified as a source of information.

specifying the date the data was updated or, if unknown, the date the data was obtained.

According to Intellectual Property Spanish law , article 13 (link in Spanish): The legal or regulatory provisions and their projects, the resolutions of the courts and the acts, agreements, deliberations and opinions of the public bodies, as well as the official translations of all the above texts are not the object of intellectual property.

So, we hope importing Catalan Parliament proceeding can be done is a similar way of Europarliament.

As 2nd question now:
Months ago a voice corpus was elaborated with Catalan Parliament data and it’s published unser CC-BY 4.0 license. It’s about 90 hours of cleaned text+clips and 213 hours of other quality text+voice. Can be imported to Common Voice too?

nukeador · May 29, 2020, 10:28am

About the corpus, I’ve added the question to next meeting with our legal team (next week). I’ll follow up once I have more information.

About the second one, if this is already an open text+voice dataset it can be just used for #deep-speech since common voice doesn’t need to validate anything. @Christos maybe this is something we can list in other sources section on the site?

alvaromp · June 1, 2020, 12:54pm

Following this discussion topic, for Catalan language, in the Valencian accent, I found another Corpus with texts from 1257 until nowadays, that seems very useful to fillin the sentence collector with more expressions and variety.

Through email, AVL answered to me that we can use all the sentences that we need without problem. If Common voice consider it, maybe could include it in the sources section.

nukeador · June 4, 2020, 4:34pm

Our legal team has reviewed this and unfortunately this is not something we can use as Public Domain (CC-0) because of the additional restrictions they list on their site:

Asking for attribution.
Asking for no commercial use.

Public Domain (cc-0), can’t have any kind of restrictions or requirements for its use.

Sorry for that.

Topic		Replies	Views
Mozilla Voice: Europarl ist nicht "echt"? Deutsch (de)	19	1428	March 11, 2021
Text Corpus Link Collection Common Voice sentence-collection	2	1704	November 15, 2020
Licensing and contribution to Common Voice Common Voice sentence-collection	5	1615	June 12, 2019
Remove my Swedish sentence submissions from parliament proceedings Common Voice sentence-collection	2	788	June 30, 2020
Sentence adding Guidelines (Case study Iran for Farsi/Persian) Common Voice	4	368	December 31, 2020

Question: Importing Catalan Parliament proceedings

Related topics