Recent Persian sample sentence submissions

I recently noticed a huge amount of sentences (>200k) waiting in the review queue, presumably added from a dictionary for Persian.
Although generally not a bad thing, most of the words are obscure/quiet rare and with a lot of spelling mistakes. And the problem gets worse knowing the current existing sentences are quite biased (the amount of colloquial sentences is below five percent compared to (usually old) text-book and written Persian, which differ a lot in form), and this huge review queue makes it impossible to add more diverse sentences on small frequent basis, considering this small contributor base.

What would a proper solution be?

1 Like

This is in the Sentence Collector, right? If so, we can delete those if they do not provide value. Do they all have the same ā€œsourceā€ displayed? If so, what is it? And are all the ā€œsentencesā€ from that source to be removed?

1 Like

Yes they are from the sentence collector. I canā€™t say that they donā€™t provide any value, but well at the current rate reviewing them all is almost impossible and the quality is not high enough for bulk submission, so they might be doing more harm than good.

All the sentences I have been reviewing recently mention ā€œself-prepared sentencesā€, which I suspect constitute a big portion of those 250k submissions considering they are all dictionary-like entries (but of course I canā€™t be sure since I cannot see all the sentences).

It would be nice if we could temporarily ā€œholdā€ these submissions for later reviews and revisions (maybe by simply putting them in a separate directory that are not exported to CV?), but if thatā€™s not possible I think removing them might be the only option at the moment, provided that this source is actually the cause of this huge queue.

Currently there are 237,663ā© sentences left to review. There are 79269 with the source indicated as ā€œself-prepared sentencesā€. These were uploaded in 12 different submissions (IDs just for my own future reference):

  • 6d37f2c0-12dc-4370-a4db-a3063a8954b6
  • 8b6d5a9a-6512-4f51-81ad-fb5ff312de9a
  • 282c3350-78b6-4943-9d35-75953a9a4346
  • c14825f2-9c3d-4fe3-acae-5f8522cd3b03
  • c938ea72-74a7-43d3-85b1-f355ab526c84
  • adbf5670-da0c-4468-b108-c0bd3c3888dd
  • da9653d4-ba14-401f-8811-c7f6b62390b0
  • c8024715-2a3e-4550-859b-d8048ac358ed
  • 6bb2b75c-887d-44b2-89d0-0788d13ad047
  • 33deb172-d814-4ce3-ad35-b6f34e134322
  • 60814880-3f6b-4605-9f40-faa9ee70b38f
  • 4acb19ce-120a-43e3-be2a-c78d723526c2

So there must be more sources that recently got added. As I do not have access to the database, I canā€™t say which ones those are though.

It would be nice if we could temporarily ā€œholdā€ these submissions for later reviews and revisions (maybe by simply putting them in a separate directory that are not exported to CV?)

That is currently not possible. Something like a quarantine might be a good idea for the future (just might), but currently there is no flag for that and Sentence Collector has a single database. Though thatā€™s just a technical limitation, identifying the actual sources and submissions that contain these sentences is way trickier.

I see, thanks.

So we would have to delete these to be able to see the other (presumably huge) submissions?