Knowing Machines Research

Research

Calculating Empires

Calculating Empires has won the Silver Lion Award at the 2025 Venice Architettura Biennale as well as the European Commission's 2024 Grand Prize for Artistic Exploration in Science, Technology, and the Arts. The jury noted that 'Calculating Empires challenges us to redefine our relationship with current socio-technical structures. By asking how we got where we are today, we can reconsider where we might be going.' Previous winners include Richard Mosse, Holly Herndon and Matt Dryhurst.

Visual Story

Models all the Way Down

LAION-5B is an open-source foundation dataset used to train AI models such as Stable Diffusion. It contains 5.8 billion image and text pairs—a size too large to make sense of. In this visual investigation, we follow the construction of the dataset to better understand its contents, implications and entanglements.

Collection

Synthetic Media Media

This project traces how media systems are using, interpreting, and anticipating Generative AI to create public life. We’re studying how the news industry frames Generative AI, when and why journalists are using it in their work, which policies and guidelines organizations are creating to regulate its use, and how people and infrastructures have the power to make Generative AI a public problem.

Collection

Understanding the Work of Dataset Creators

The work of the people who make datasets is crucial. They build the architectures of ground truth that shape AI systems. Yet there has been very little research that has focused on dataset creators or listened to what they have to say. In this project, we speak with 18 different dataset creators in a series of interviews that reveal the messy and contingent realities of dataset preparation. We hear about their practices and the shared challenges they face. We offer a set of actionable recommendations that would improve the practice of dataset creation while also building a more responsible AI ecosystem.

Collection

Bird in hand

What can birding teach us about machine learning? And how is AI shaping how we interact with nature? Projects at the intersection of nature observation, citizen science, and machine learning offer useful case studies for examining systems of dataset production, model training and human feedback. They also present an alternative model to the extractive and exploitative “Big Data” approach to training machine learning algorithms, offering many possibilities as well as unique challenges for thinking through how we relate to AI systems.

Explainer

Generative AI Legal Explainer

Generative AI raises a host of legal questions and concerns. Some of these questions will challenge existing legal rules and require new laws and policy frameworks. Others have answers that are quite well settled, notwithstanding the new AI context bringing attention to them.

Collection

Knowing Legal Machines

Many of the social questions raised by artificial intelligence are mediated through the legal system. Policymakers explore new rules to govern the technology, courts work to apply existing legal framework to new situations, and advocates propose entirely new approaches to deal with novel problems (or old problems with new prominence).

Collection

9 ways to see a Dataset

To further the understanding of training data, the Knowing Machines Project developed SeeSet, an investigative tool for examining the training datasets for AI. Here you will find nine essays from individual members of our team. Each one uses SeeSet to explore a key AI dataset and its role in the construction of 'ground truth.'

Guide

A CRITICAL FIELD GUIDE FOR WORKING WITH MACHINE LEARNING DATASETS

Maybe you’re an engineer creating a new machine vision system to track birds. You might be a journalist using social media data to research Costa Rican households. You could be a researcher who stumbled upon your university’s archive of handwritten census cards from 1939. Or a designer creating a chatbot that relies on large language models like GPT-3. Perhaps you’re an artist experimenting with visual style combinations using DALLE-2. Or maybe you’re an activist with an urgent story that needs telling, and you’re searching for the right dataset to tell it.

Reading List

CRITICAL DATASET STUDIES

This collection provides a curated reading list for researchers, practitioners, and students seeking to understand how machine learning datasets work, are utilised, and are influenced by various social, political, and ethical issues. The list is organised into various sections to help readers follow their specific interests and is primarily focused on academic publications. This list is also not meant to be exhaustive. We see the list as a living resource and invite readers to make suggestions and contributions. We hope this reading list might serve as a useful resource for scholars and practitioners investigating ML datasets as sociotechnical assemblages that shape and are shaped by social worlds.

PUBLICATIONS

F. Corry, H. Sridharan, A.S. Luccioni, M. Ananny, J. Schultz, & K. Crawford, The Problem of Zombie Datasets: A Framework For Deprecating Datasets. arXiv preprint arXiv:2111.04424, 2021.
S. Ciston, “A CRITICAL FIELD GUIDE FOR WORKING WITH MACHINE LEARNING DATASETS,” K. Crawford and M. Ananny, Eds., Knowing Machines project, Feb. 2023. https://knowingmachines.org/critical-field-guide
Jason Schultz, The Right of Publicity: A New Framework for Regulating Facial Recognition, __ BROOKLYN L. REV. __ (2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4243000.
A. S. Luccioni, F. Corry, H. Sridharan, M. Ananny, J. Schultz & K. Crawford. 2022. A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 199–212.
https://doi.org/10.1145/3531146.3533086.
M. Ananny, E. Kang, H. Sridharan, F. Corry. Tracing Training: Understanding Media Industries’ Machine Learning Datasets. Paper presented at the International Communication Association Annual Conference. Paris, France. May, 2022.
Corry, F., Kang, E. B., Sridharan, H., Luccioni, S., Ananny, M., & Crawford, K. (2022). Critical Dataset Studies Reading List. Knowing Machines. https://knowingmachines.org/reading-list
Knowing Machines Res. Project, Comment on Proposed Trade Regulation Rule on Commercial Surveillance and Data Security (Dec. 1, 2022), https://www.regulations.gov/comment/FTC-2022-0053-1142.
Jason Schultz and Melodi Dincer, Amici Brief of Science, Legal, and Technology Scholars in Renderos et al. v. Clearview AI, Inc. et al., No. RG21096898 (Superior Ct. Alameda County) (September 19, 2022), https://ssrn.com/abstract=4238870.