High amount of low quality submissions in Sentence Collector makes reviewing boring

In the review queue for Swedish submissions in Sentence Collector, there are thousands of entries with the source “Project Gutenberg, with slight tweaks by me.”. These seem to be based on texts with old style grammar (like plural verbs that haven’t been used in Swedish since the first half of the last century and word order that almost all people would find odd nowadays) and spelling that are not nearly tweaked enough to work satisfactorily to be used in Common Voice. There are also sentences with incorrect capitalization of text and spurious full stops in the middle of sentences.

It’s a real chore to go through and reject a majority of entries that all suffer from similar problems. Before it was possible to skip ahead in the queue and review things later on in the queue which I miss now. Is there some other way to approve sentences in Sweden while avoiding going through all these I’d rather skip?

1 Like

Hi there,

At this point there is not, no. However, I think the main question here is: what does the ratio of good sentences vs. bad sentences look like? If the full data source does not provide much value and just contributes to frustration during review, it might be worth it to remove not-yet-reviewed sentences completely. What do you think?

I just want to agree here.

Even in cases where old sentences are pretty similar to modern spelling and wording, they still often use commas in ways we don’t nowadays. This isn’t really an issue per se I guess, but you can very often hear people reading these sentences awkwardly, making “pauses” in ways you never do in normal speech, all because of the (by modern standards) awkwardly placed comma. If these commas weren’t there, the recordings would probably have higher quality.

An example that works in English as well could be “He thought, that this was an awkward comma”. We used commas like this back in the 50s or so, but not anymore. People really stumble on commas like these.

I think my suggestion would then be to remove all remaining, not yet reviewed sentences from that source. @ftyers would you agree?

Agree with @mkohler, the best thing to do is remove sentences from that source.

I have now deployed a version deleting these sentences. Note that deployment might take a while.

Have they been deployed by now? In the queue I still see 1000 similar sentences with the same source “Project Gutenberg, with slight tweaks from me.” and outdated (or incorrect) grammar. Are these new entries from the same user or the old ones?

Mh, looks like it’s the old ones. I’ll have a look in the next few days.

For reference: https://commonvoice.mozilla.org/sentence-collector/sentences/sv-SE?source=Project%20Gutenberg,%20with%20slight%20tweaks%20by%20me.

1 Like

I understand the frustration that can arise from encountering a large number of low-quality submissions while reviewing content in the Sentence Collector program. It can make the reviewing process tedious and discouraging, and may even lead you to lose interest in participating.

One way to address this issue is to take breaks and avoid reviewing too many submissions in one sitting. Another approach is to provide constructive feedback to submitters, offering suggestions for how they can improve their writing. This can help them enhance their skills and contribute to the overall quality of submissions.

Additionally, it’s important to report any spam or low-quality submissions to the relevant authority, so that appropriate action can be taken, such as removing the content or banning the user responsible.

Remember, as a reviewer, you play a crucial role in maintaining the accuracy and usefulness of the Sentence Collector program. Your efforts help ensure that the collected sentences are informative and reliable. So, don’t get discouraged and continue to do your best.