We have finally reached the end of reviewing the bulk sentence request process. Thank you for your patience while the team worked on it. The new process will consist of the following steps after submitting a request:
Community coordinator will review the request to make sure that all the fields are completed and send the file to legal for review.
If legal approves, the community coordinator will communicate with the communities for quality assurance checks. This process involves the following:
Random selection of 50 sentences marked on the submitted file to be quality checked. Quality assurance comments should be provided in the āSentence quality assurance feedbackā column.
If the reviewed sentences meet the quality assurance criteria, the bulk sentences will be merged.
If the review reveals that some sentences are not sufficiently high quality and do not meet the criteria - eg. poor spelling or grammar, dubious open source provenance, violation of community guidelines. The reviewer should inform the community coordinator and the submitter to resolve the identified issues and resubmit.
If legal does not approve, the community coordinator will communicate with the submitter and provide reasons for rejection. The submitter will work with the community coordinator for resubmission.
Community members must only submit sentences that fall within the public domain. Alternatively, they should sign the CC0 waiver form to prevent any delays in the updated process.
Thank you for your cooperation, this is to ensure the continued high quality of the sentences.
Hey @gina, thank you for the update. Just some quick questions:
What is the medium of the initial and final request/submission? File upload via web form? Github? Is there a sample
Not all submissions are equal wrt sources. They can be books which dropped into public domain or self/community generated, etc. In the latter case, where they are already quality checked, most of these steps seem to be unnecessary.
In case of āquestionableā legal status, wouldnāt it be logical to check with legal before preparing the resource file?
Iām asking these because in our workflow our community aims for 0 (zero) errors, multiple people read ALL sentences before submitting. I had several rejections in the past after such hard workā¦
The process is still the same, files are uploaded via the MCV website using the updated template.
Yes, that is indicated in the āSourceā column of the template.
Given the steps involved in quality assurance, we prefer legal approves first.
Weāve had community members complain about bad sentences in the corpus, this is to ensure that we accept only high quality sentences.
Hi @neouyghur, the file has been sent to legal for copyright license review, a process that may require some time. I will update you on the subsequent steps as soon as I receive feedback.
Hi @neouyghur this process may take some time, I will email you the next steps as soon as I receive feedback. Kindly note that after legal review, I need to find a quality assurance person to check the file, this may take more time, kindly be patient as we go through this process. I will keep you updated.
Hi @gina thanks for your reply. I am one of the members of the Uyghur community who is actively contributing to the CV Uyghur dataset. I can assist you in finding the right person to evaluate the sentences written in Uyghur.
@gina, Iām adapting our collaborative Google Sheet template to the new style, but I have a question about the template:
In your examples, you used real names as the source. Is this mandatory?
We are working as a community and we switched to use nicknames after the TTS hit. Iām more or less a public figure, but I donāt want to expose any other people with their real names⦠āCommunity member, copyright waivedā would suffice Iād like to presumeā¦