Need help with batch deleting (300k+) in sentence collector inappropriate samples

In addition to my post about vast amount of single words

Total amount of samples in Ukrainian Collector now is 310k+. Almost all are just single words. Someone loaded just a full dictionary with specified source https://github.com/brown-uk/dict_uk/releases and https://github.com/brown-uk.
Now is near impossible to reject all of them manually. Please help our small community to develop further.

2 Likes

Also there are a lot of sentences from a couple of (harmful in my point of view) religious books (no, not Bible, even worse).

If you are talking about approved sentences - you can report them on the listen/speak pages. In sentence collector you can Dislike them.

Also here is info about editing approved words

1 Like

Generally I don’t think that it’s fun, efficient or useful to have the full dictionary there. If it indeed should be helpful for the model (which I doubt), this is something that should be imported outside of the Sentence Collector.

I have removed those from the Sentence Collector database.

See my question in https://github.com/common-voice/sentence-collector/issues/425.

2 Likes