Why not use texts that are CC-BY? (The point of AI is to free us from copyright)

The reason I think we can use CC-BY text:
The implications of this is ending up with Data-sets under CC-BY, as far as I understand that license shouldn’t pass to deep speech or any AI training.
I asked that question on Quora regarding machine translation and got a reasonable answer.

Is it legal to train neural networks to build machine translations based on copyrighted texts?

Yes. It makes no more sense for that to be illegal than it makes for it to be illegal for a human being to study how to translate texts by comparing copyrighted texts to translations of them.
Now, if you use that neural network to then translate a copyrighted text, the translation is subject to the original copyright, just as would be the case if you learned to translate the language yourself and then translated the text.

Training an AI is like teaching a child, I can use copyrighted materials to teach a kid a skill, he/she is the sole owner of the fruits of that skill.

1 Like

The point here is that we want to release the full dataset, including the texts, and we want to make sure it can be used in any situations.

So yes, we care about Deep Speech models license, but also about our full dataset. That’s why having the less restrictive license is serving our current needs better.


1 Like