Are you importing the CC0-sentences from Tatoeba?

I see Tatoeba has been mentioned before.
-And that the license of most of their sentences are CC-BY and not compatible with CV.

But now they have organized their CC0-sentences.

Tatoeba is providing a download with all their CC0-sentences. The file is updated weekly.

Currently it contains over 340.000 sentences in the Kayle language!
-And over 14.000 in various other languages.

I guess you have to do some cheking on content/quality and actual copyright on all those sentences in Kabyle…

But could the others be imported every week into Sentence Collector, ready for review?

Currently it seems that a few thousand sentences are added each week. And this might increase, i hope … :slightly_smiling_face:

1 Like

Interesting, do you have a link for those?

We’d definitely need a review process for this, I doubt reviewing a few thousand sentences per week in Sentence Collector would be very scalable.

Not really from Tatoeba.
Some contributors put their sentences on Github under CC0 in their own space (account) before using them on Sentence Collector or other projects. Some of them are/were not aware about licence. And when putting a sentence on Github under a free licence (CC0), not all users are aware to keep the same licence when they reuse them (their own) on other projects when these projects manage other licence types other than CC0.

Hello! Sorry, I don’t know English. Can you use a translator?

Esperanto: Se vi ankoraŭ bezonas la ligilon, tiam mi sendis ĝin tien ĉi: tatoeba.org/downloads

Looked into this and found (in addition to the download section under “Sentences (CC0)” on the downloads page) this instruction on their wiki to Tatoeba users on how to license your contributions as CC0.

It would be interesting to me to contribute to Tatoeba (as CC0) and have it imported to Common Voice rather than adding directly to Sentence collector since my work then could be used for other things.

1 Like