Knowing Legal Machines

Many of the social questions raised by artificial intelligence are mediated through the legal system. Policymakers explore new rules to govern the technology, courts work to apply existing legal framework to new situations, and advocates propose entirely new approaches to deal with novel problems (or old problems with new prominence).

9 ways to see a Dataset

To further the understanding of training data, the Knowing Machines Project developed SeeSet, an investigative tool for examining the training datasets for AI. Here you will find nine essays from individual members of our team. Each one uses SeeSet to explore a key AI dataset and its role in the construction of 'ground truth.'


Maybe you’re an engineer creating a new machine vision system to track birds. You might be a journalist using social media data to research Costa Rican households. You could be a researcher who stumbled upon your university’s archive of handwritten census cards from 1939. Or a designer creating a chatbot that relies on large language models like GPT-3. Perhaps you’re an artist experimenting with visual style combinations using DALLE-2. Or maybe you’re an activist with an urgent story that needs telling, and you’re searching for the right dataset to tell it.
Reading List


This collection provides a curated reading list for researchers, practitioners, and students seeking to understand how machine learning datasets work, are utilised, and are influenced by various social, political, and ethical issues. The list is organised into various sections to help readers follow their specific interests and is primarily focused on academic publications. This list is also not meant to be exhaustive. We see the list as a living resource and invite readers to make suggestions and contributions. We hope this reading list might serve as a useful resource for scholars and practitioners investigating ML datasets as sociotechnical assemblages that shape and are shaped by social worlds.
  • F. Corry, H. Sridharan, A.S. Luccioni, M. Ananny, J. Schultz, & K. Crawford, The Problem of Zombie Datasets: A Framework For Deprecating Datasets. arXiv preprint arXiv:2111.04424, 2021.
  • S. Ciston, “A CRITICAL FIELD GUIDE FOR WORKING WITH MACHINE LEARNING DATASETS,” K. Crawford and M. Ananny, Eds., Knowing Machines project, Feb. 2023.
  • Jason Schultz, The Right of Publicity: A New Framework for Regulating Facial Recognition, __ BROOKLYN L. REV. __ (2023),
  • A. S. Luccioni, F. Corry, H. Sridharan, M. Ananny, J. Schultz & K. Crawford. 2022. A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 199–212.
  • M. Ananny, E. Kang, H. Sridharan, F. Corry. Tracing Training: Understanding Media Industries’ Machine Learning Datasets. Paper presented at the International Communication Association Annual Conference. Paris, France. May, 2022.
  • Corry, F., Kang, E. B., Sridharan, H., Luccioni, S., Ananny, M., & Crawford, K. (2022). Critical Dataset Studies Reading List. Knowing Machines.
  • Knowing Machines Res. Project, Comment on Proposed Trade Regulation Rule on Commercial Surveillance and Data Security (Dec. 1, 2022),
  • Jason Schultz and Melodi Dincer, Amici Brief of Science, Legal, and Technology Scholars in Renderos et al. v. Clearview AI, Inc. et al., No. RG21096898 (Superior Ct. Alameda County) (September 19, 2022),