Basque dataset ready

sentence-collection

(Txopi) #1

I know the sentence collection tool is coming, and it will help sentence upload and revision. But during last two months, we have collected more than 6.000 CC0 sentences in Basque language and we already have checked, fixed, cleaned and reviewed them. The dataset is small but contains a diverse grammar and lexicon.
Here it is a pull request with those first Basque sentences.
NOTE: the website is already translated to Basque.


(Rubén Martín) #2

Thanks for this work @txopi

Note that we are currently improving our processes to include sentences in the corpus and we are working on the final guidelines to accept sentences that will be applied to the sentence collection tool.

This means we will probably need to wait a bit to add sentences following this improved processes to make sure they are 100% useful for the speech engine learning, but definitely Basque will have already a lot covered thanks to this work :slight_smile:

Cheers.


(Txopi) #3

Did you changed your decision? I realised some days ago Basque appeared as Launched language in voice.mozilla.org!

Nobody seems to read Slack chat so I decided to ask for some fixes in the sentences (found in part thanks to the Sentence Collector). I also asked to load Basque accents.

I think this changes should be done before Basque people starts participating on Common Voice. @nukeador, shouldn’t we make a step forward or backward as soon as possible and resolve Basque languages situation? Please, help!