Collection
Understanding the Work of Dataset Creators

The work of the people who make datasets is crucial. They build the architectures of ground truth that shape AI systems. Yet there has been very little research that has focused on dataset creators or listened to what they have to say. In this project, we speak with 18 different dataset creators in a series of interviews that reveal the messy and contingent realities of dataset preparation. We hear about their practices and the shared challenges they face. We offer a set of actionable recommendations that would improve the practice of dataset creation while also building a more responsible AI ecosystem.

essay

ARCHITECTS OF AI: Insights from Dataset Creators

Will Orr & Kate Crawford
As part of the Knowing Machines Project, we have been interviewing dataset creators to understand the current developments in AI from the perspectives of those in charge of gathering data and sharing it with the technical community. Rather than investigating datasets as stable artifacts, we analyze "datasets-in-the-making" to uncover the uncertainties, personal assessments, and trade-offs made by dataset creators, amid their social, technical, and organizational contexts.
Article

The social construction of datasets: On the practices, processes and challenges of dataset creation for machine learning

Will Orr & Kate Crawford
Despite the critical role that datasets play in how systems make predictions and interpret the world, the dynamics of their construction are not well understood. Drawing on a corpus of interviews with dataset creators, we uncover the messy and contingent realities of dataset preparation.
Article

Building Better Datasets: Seven Recommendations for Responsible Design from Dataset Creators

Will Orr & Kate Crawford
The increasing demand for high-quality datasets in machine learning has raised concerns about the ethical and responsible creation of these datasets. Dataset creators play a crucial role in developing responsible practices, yet their perspectives and expertise have not yet been highlighted in the current literature.