Extending our sentence collection capabilities

Hi,

As I commented on github, the extraction process was consulted and validated with Mozilla legal team and also communicated to Wikipedia. Our dataset remains Public Domain worldwide. The process is described in this topic (max. 3 random sentences from each article)

If you have concrete concerns we can add them to a list and consult with our legal team in our next meeting.

Thanks for your feedback!