Patterns to help communities build healthy AI datasets

This session is facilitated by Fionntán O’Donnell, Rachel Wilson

Show on schedule

About this session

10 mins: Introduce the concept of design patterns found in collaboratively maintained datasets using the example of Mozilla’s Common Voice project

30mins. Guided exercise: Explore how we might apply some of these patterns to existing machine learning datasets to increase trust and reduce bias. Each group is given a selection of patterns and will discuss how they could be applied to example AI projects.

20mins: Open discussion: What are the participants’ experiences with maintaining healthy AI datasets? Can we spot commonalities? Do we have the beginnings of a new pattern we can share with the community?

Goals of this session

Introduce the concept of design patterns used by collaboratively maintained datasets such as Open Street Map and MusicBrainz. Understand how these patterns can help the AI community reduce bias in their datasets through community input. Also, to discuss how a community can agree on what a “healthy” dataset means to them.