Extending our sentence collection capabilities

nukeador · September 3, 2019, 11:26am

Hi,

As I commented on github, the extraction process was consulted and validated with Mozilla legal team and also communicated to Wikipedia. Our dataset remains Public Domain worldwide. The process is described in this topic (max. 3 random sentences from each article)

If you have concrete concerns we can add them to a list and consult with our legal team in our next meeting.

Thanks for your feedback!

Topic		Replies	Views
📖 Readme: How to see my language on Common Voice Common Voice announcements	40	14211	May 10, 2022
Common voice sentences are the opposite of "common" Common Voice participation , sentence-collection , feedback , issue	27	3805	September 7, 2024
Bulk sentences submission from Wikipedia Common Voice sentence-collection	4	605	August 12, 2024
We want your feedback: Improving the sentence collection Common Voice sentence-collection , feedback	39	8886	January 9, 2019
Polish dataset download Common Voice dataset	49	4673	April 13, 2020

Extending our sentence collection capabilities

Related topics