Hi,
I’ve recently got started with the sentence collector (I’m reviewing Belarusian sentences), and I have a few questions:
(1) The review tool loads first 10K sentences which haven’t been previously reviewed by me, ordered by upload time. However, in total there are ~100K Belarusian sentences available for review now. So in order to get access to the rest, it would be required to upvote / downvote each of the first 10K sentences, which is not particularly efficient – there is much noise, few sentences are good. I found a way to download all sentences locally and then send upvotes programmatically (with kinto-http, mimicking the logic in sentences-meta.js) to the sentence IDs of my choice. Is that something that I’m allowed to do, or should we instead fix the web UI, adding pagination beyond 10K and sorting / filtering options?
(2) In some of the Belarusian sentences that have already been approved, there are formatting issues – most prominently, Latinic i
instead of Belarusian і
(U+0456). Again, I’m able to edit the sentences programmatically inside the Kinto collection (example), but I need to know if that is legitimate, or the preferred process is different.
(3) There also exist approved sentences that look bad to me, e.g. those containing dialectal or archaic words that are no longer used in modern standard Belarusian (Фаэтон гоцаў і гутаўся; Ай, каб не дарагоўля — памякчэў ба народ), or those which are not full sentences but rather nominal phrases (Спыненне дзеяння пасведчання аб дзяржаўнай рэгiстрацыi; Выхадныя звесткi друкаваных выданняў). These items cannot be unapproved, as they already have 2 upvotes. Is there anything that can be done about them?
(4) Tatoeba has several thousand Belarusian sentences under CC0, and most of them fit the criteria (i.e. short enough, no digits, etc.). Are there any plans to import CC0 data from Tatoeba in a centralized manner, or is it allowed to upload the CC0 sentences of my choice from Tatoeba into the sentence collector?
Thanks in advance for any comments.