Support needed to get more sentences in Persian

nukeador · May 11, 2020, 11:18am

Hello everyone,

I’ve just noticed that Persian (fa) has validated 276hrs of voice

https://voice.mozilla.org/languages

But I’ve just checked and Persian only has 12K sentences on the system, which means people are recording them again and again, something we know it’s not ideal for the quality of the dataset.

This is a call to action to Persian speakers with technical knowledge to help with the Persian wikipedia extraction:

Important: Please do not use the sentence collector to send wikipedia sentences, we must use the process describe in the link above.

This would allow the project to have way more sentences without repetitions, increasing the quality of the Persian dataset.

Thanks!

isomorph70 · May 11, 2020, 1:11pm

I shared it with a Persian speaking Computer Scientist friend.

vox · May 16, 2020, 10:37am

I’ve followed this principle and submitted a lot of sentences in Common Voice Sentence Collector:

Extending our sentence collection capabilities : We are able to use sentences from Wikipedia as long as we don’t extract more than 3 random sentences per article.

But most of them are rejected without any reason.

nukeador · May 18, 2020, 11:12am

vox, do you mean you have sent wikipedia sentences to the sentence collector?

If that’s the case we will have to delete them, since as you can see on the linked topic, wikipedia sentences have some legal requirements that we must follow using the special process described there.

Topic		Replies	Views
Sentence Extraction now automated Common Voice	4	1357	March 19, 2020
Sentence Extractor - Current Status and Workflow Summary Common Voice sentence-collection	4	3513	July 26, 2020
Recent Persian sample sentence submissions Common Voice	4	581	March 15, 2022
Extending our sentence collection capabilities Common Voice sentence-collection , announcements	19	3763	September 11, 2019
Polish language ready to recording and reviewing recordings Common Voice participation , learning , sentence-collection	3	1446	August 26, 2019

Support needed to get more sentences in Persian

Related topics