User Submitted Data and Privacy?

zephyr · January 25, 2020, 6:12pm

Hi! I make a plugin that blocks NSFW images for users called Wingman Jr.. It uses machine learning, and I’m constantly improving the model. I’d like to provide a way for users to provide feedback about the image blocking so I can make it better for everybody!

Specifically, what I have in mind is something like a “report recent images” feature that would take some number of the last images that the user has viewed and submit that data to a server, where I would review and incorporate it into a dataset for training. Right now that dataset is private, but I would like to keep the option of opening it up someday - I think the lack of a good dataset of both positives and negatives is a limiter of this type of model in research.

Now there’s lots of ways that this could be done well or it could be done poorly. Here’s what I thought would work well, but I’d like some feedback:

Provide a clear privacy policy and disclaimer that make it clear that data is only collected when specifically submitted by the user.
Collect URLs, not the actual image data. This is how e.g. ImageNet etc. approach the creation of such datsets.
Exclude data URLs for the same reason.
Clearly show the user a copy of the data that will be submitted with a “Yes/No” decision before each submission - e.g. the list of URL’s, any metadata. This way the user can have a last chance to review and back out anything they don’t want to submit.
Clearly state once again at the time of submitting the user data what will happen with the collected data.

I think if a plugin provided these bounds, I would feel OK with submitting data if I believed in what they were trying to do.

So what do you think? Good idea, bad idea? What pitfalls haven’t I thought of yet?

Thanks for the feedback!