Hi! I make a plugin that blocks NSFW images for users called Wingman Jr.. It uses machine learning, and I’m constantly improving the model. I’d like to provide a way for users to provide feedback about the image blocking so I can make it better for everybody!
Specifically, what I have in mind is something like a “report recent images” feature that would take some number of the last images that the user has viewed and submit that data to a server, where I would review and incorporate it into a dataset for training. Right now that dataset is private, but I would like to keep the option of opening it up someday - I think the lack of a good dataset of both positives and negatives is a limiter of this type of model in research.
Now there’s lots of ways that this could be done well or it could be done poorly. Here’s what I thought would work well, but I’d like some feedback:
- Provide a clear privacy policy and disclaimer that make it clear that data is only collected when specifically submitted by the user.
- Collect URLs, not the actual image data. This is how e.g. ImageNet etc. approach the creation of such datsets.
- Exclude data URLs for the same reason.
- Clearly show the user a copy of the data that will be submitted with a “Yes/No” decision before each submission - e.g. the list of URL’s, any metadata. This way the user can have a last chance to review and back out anything they don’t want to submit.
- Clearly state once again at the time of submitting the user data what will happen with the collected data.
I think if a plugin provided these bounds, I would feel OK with submitting data if I believed in what they were trying to do.
So what do you think? Good idea, bad idea? What pitfalls haven’t I thought of yet?
Thanks for the feedback!