Hi, I’m looking forward to the Portuguese language, I wanna contribute, how can I do? It seems like we need 5000 sentences, that wouldn’t be hard, there is a corpus with 130k sennteces for portuguese, developed by a speech research team, which we could use. For informal language we could write our …

I’m sure @reuben can give some hints on portuguese :). BTW, please make sure your 130k dataset can be licensed as CC-0 otherwise it cannot be used for Common Voice. [image] Codigo_Logo_Programacao_e_Inteligencia_Artificial: Also do you guys have plans to add the Spoken wikipedia to the datasets…

Well, so we can use the wikipedia right? Ok, I’ll look forward, with regards to informal language, we could write some sentences to get 5000 and then get started.

[image] Codigo_Logo_Programacao_e_Inteligencia_Artificial: Well, so we can use the wikipedia right Wikipedia should not be your only component of the dataset, but it can be a part of it. Please ensure you only extract CC-0 content.

Here is his code (tested only on french, so far): https://github.com/jeanbaptisteb/commonvoice-fr/blob/master/Wikipedia_CC0.py

As @lissyx said above, there’s a script to extract content from Wikipedia under the CC0 licence. I still need to fix a couple of bugs on it, and maybe to make it simpler to use. I’ll try to check this week-end if it’s possible to extract content from the Wikipedia in Portuguese with this script. Bu…

Please keep this efforts collecting sentences under public domain, as soon as we have our sentence collection tool ready we should be able use it to submit, validate and review them so they can be incorporated in the database.

I’ve created this topic to centralize questions [image] open_book Readme: How to see my language on Common Voice Common Voice triangular_flag_on_post This information is also now available on the About Pages on Common Voice Website. Please help us to localise this by join…

How can I send sentences to contribute?

Common Voice

nukeador (Rubén Martín [❌ taking a break from Mozilla]) September 5, 2018, 3:28pm 8

I’ve created this topic to centralize questions

Topic		Replies	Views
📖 Readme: How to see my language on Common Voice Common Voice announcements	35	14466	May 10, 2022
Spanish dataset Common Voice sentence-collection	17	3124	April 3, 2019
Where should I go to contribute new sentences? Common Voice sentence-collection	3	1451	September 5, 2018
Polish language ready to recording and reviewing recordings Common Voice participation , learning , sentence-collection	3	1447	August 26, 2019
Spoken language vs written language in Tamil Common Voice sentence-collection	9	2954	November 1, 2019

How can I send sentences to contribute?

Related topics