Somebody submitted a lot of sentences for Ukrainian which are all inappropriate.
I’ve reviewed it but there is nothing to approve.
Is it possible to clear them all?
The second question is where can I find already approved sentences to have a look at it?
Currently there are 9289 unreviewed sentences.
My username on sentence collector service is artem and my GitHub username is a-polivanchuk.
Thanks for the link! I quickly went through the list and see many incorrect sentences as well which sounds absolutely not natural. I guess they should be removed too.
Can I clean it up and create the PR later?
Thanks. Do I understand correctly that you went through all 9000 se tences and there are none left that warrant an approval?
Can you give me some examples of these sentences and explain why they are not natural?
For the approved sentences a PR is not enough, as the next export will just export them again. We also would need to delete those in the Sentence Collector database. For that I have a script if you could give me a text file with all the sentences to delete, line by line. Running an export after that will also delete them from the sentence-collector.txt file.
Yes, that’s what I mean. All the sentences are typically similar and related to some political discussion. There are even sentences in Russian. It looks like the list was just copied and pasted without any additional processing and reviewing.
Examples:
Порошенко так не робив як Ви, Порошенко телефон у мене не забирав. (Mentioned the ex-president’s surname and regarding the mobile phone)
Вот этот шаг Вы можно делать сейчас без Верховной Рады. (This sentence is absurd and it is in Russian)
Ми говоримо про повагу, Олег Валерійович, давайте триматися поваги. (Mentioned first-name and surname of some politician and grammatically incorrect)
Я просто говорю як пропозицію. (Grammatically incorrect and not natural)
Перепрошую, одну секундочку, тому що дійсно друге читання. (Not natural and truncated context)
At first, I tried to catch and approve good sentences, but then realized it’s a waste of time.
Got it! Regarding already approved sentences, I’ll prepare the txt file and provide it to you when it’s ready.
They are just having bad grammar or was said by person with A2/B1 level of language proficiency.
Порошенко так не робив як Ви, Порошенко телефон у мене не забирав.
↓ Native speaker would say something like this (guessing as sentence meaning is not clear at all)
Порошенко так як Ви не робив, наприклад телефона у мене не забирав.
§
Вот этот шаг Вы можно делать сейчас без Верховной Рады.
Yes, this was said in Russian (and with mistakes in Russian).
§
Ми говоримо про повагу, Олег Валерійович, давайте триматися поваги.
↓ First part sound weird, but the last part is just wrong.
Ми говоримо про повагу, Олег Валерійович, давайте поважати один одного.
§
Я просто говорю як пропозицію.
↓ Grammatically incorrect
Я висуваю пропозицію.
§
Перепрошую, одну секундочку, тому що дійсно друге читання.
↓ Just guessing meaning, but it seems it should be like:
Перепрошую, хвилиночку! Це дійсно друге читання.
nukeador
(Rubén Martín [❌ taking a break from Mozilla])
11
It’s interesting that we got 9K of wrong sentences.
@mkohler if all of them are confirmed to be wrong, yes, let’s delete them.
If you’re referencing to these ones (a bit more than 1k), then yes — most of them are truncated or sound weird.
I think it’s because most of these sentences are taken from unedited (and without any proofreading or even formatting) transcripts of working sessions instead of organized public speaking. See an attached file as an example (found from “Чи немає необхідності в доповіді і обговоренні, колеги? Ніхто не наполягає?” string).
I can share with you my experience with my community (Kabyle, a minor language) to collect more sentences and recruit more contribs.
It’s better to look for graduated people from language departments as I did. Onsite workshops/Speeches about CV ans SC can also help to give assitance. But, Social Communication (pages, blogs, videos, …) and sometimes traditional media (TV, newspapers, radio…) is the best way to reach more people.
@artem can you confirm that this is still the criteria? Should the date be extended to today?
With these requirements I’m getting 6592 records to delete. When extending the date to today I’m getting 6672, which is the total unreviewed sentences. Just to make sure: Are there sentences not approved by “artem”, but would be valid?
Feel free to go through these and tell me which ones to delete. A text file with one sentence per line to delete would be perfect, as I have a script for that. For the new sentences, please also give a quick indication on why they should be deleted if needed.