Synthetic Media Media
This project traces how media systems are using, interpreting, and anticipating Generative AI to create public life. We’re studying how the news industry frames Generative AI, when and why journalists are using it in their work, which policies and guidelines organizations are creating to regulate its use, and how people and infrastructures have the power to make Generative AI a public problem.
Calculating Empires is a new exhibition by Kate Crawford and Vladan Joler that opens at Fondazione Prada on November 23, 2023 at the Osservatorio in Milan. Joler and Crawford contextualize the current explosion of artificial intelligence by asking how we got here — and to consider where we might be going. Multiple works of critical cartography span the two floors, and invite visitors to experience the longue durée of how technology and power have been intertwined since 1500.
Understanding the Work of Dataset Creators
The work of the people who make datasets is crucial. They build the architectures of ground truth that shape AI systems. Yet there has been very little research that has focused on dataset creators or listened to what they have to say. In this project, we speak with 18 different dataset creators in a series of interviews that reveal the messy and contingent realities of dataset preparation. We hear about their practices and the shared challenges they face. We offer a set of actionable recommendations that would improve the practice of dataset creation while also building a more responsible AI ecosystem.
Bird in hand
What can birding teach us about machine learning? And how is AI shaping how we interact with nature? Projects at the intersection of nature observation, citizen science, and machine learning offer useful case studies for examining systems of dataset production, model training and human feedback. They also present an alternative model to the extractive and exploitative “Big Data” approach to training machine learning algorithms, offering many possibilities as well as unique challenges for thinking through how we relate to AI systems.
Generative AI Legal Explainer
Generative AI raises a host of legal questions and concerns. Some of these questions will challenge existing legal rules and require new laws and policy frameworks. Others have answers that are quite well settled, notwithstanding the new AI context bringing attention to them.
Knowing Legal Machines
Many of the social questions raised by artificial intelligence are mediated through the legal system. Policymakers explore new rules to govern the technology, courts work to apply existing legal framework to new situations, and advocates propose entirely new approaches to deal with novel problems (or old problems with new prominence).
9 ways to see a Dataset
To further the understanding of training data, the Knowing Machines Project developed SeeSet, an investigative tool for examining the training datasets for AI. Here you will find nine essays from individual members of our team. Each one uses SeeSet to explore a key AI dataset and its role in the construction of 'ground truth.'
A CRITICAL FIELD GUIDE FOR WORKING WITH MACHINE LEARNING DATASETS
Maybe you’re an engineer creating a new machine vision system to track birds. You might be a journalist using social media data to research Costa Rican households. You could be a researcher who stumbled upon your university’s archive of handwritten census cards from 1939. Or a designer creating a chatbot that relies on large language models like GPT-3. Perhaps you’re an artist experimenting with visual style combinations using DALLE-2. Or maybe you’re an activist with an urgent story that needs telling, and you’re searching for the right dataset to tell it.
CRITICAL DATASET STUDIES
This collection provides a curated reading list for researchers, practitioners, and students seeking to understand how machine learning datasets work, are utilised, and are influenced by various social, political, and ethical issues. The list is organised into various sections to help readers follow their specific interests and is primarily focused on academic publications. This list is also not meant to be exhaustive. We see the list as a living resource and invite readers to make suggestions and contributions. We hope this reading list might serve as a useful resource for scholars and practitioners investigating ML datasets as sociotechnical assemblages that shape and are shaped by social worlds.
- F. Corry, H. Sridharan, A.S. Luccioni, M. Ananny, J. Schultz, & K. Crawford, The Problem of Zombie Datasets: A Framework For Deprecating Datasets. arXiv preprint arXiv:2111.04424, 2021.
- S. Ciston, “A CRITICAL FIELD GUIDE FOR WORKING WITH MACHINE LEARNING DATASETS,” K. Crawford and M. Ananny, Eds., Knowing Machines project, Feb. 2023. https://knowingmachines.org/critical-field-guide
- Jason Schultz, The Right of Publicity: A New Framework for Regulating Facial Recognition, __ BROOKLYN L. REV. __ (2023), https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4243000.
- A. S. Luccioni, F. Corry, H. Sridharan, M. Ananny, J. Schultz & K. Crawford. 2022. A Framework for Deprecating Datasets: Standardizing Documentation, Identification, and Communication. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 199–212.
- M. Ananny, E. Kang, H. Sridharan, F. Corry. Tracing Training: Understanding Media Industries’ Machine Learning Datasets. Paper presented at the International Communication Association Annual Conference. Paris, France. May, 2022.
- Corry, F., Kang, E. B., Sridharan, H., Luccioni, S., Ananny, M., & Crawford, K. (2022). Critical Dataset Studies Reading List. Knowing Machines. https://knowingmachines.org/reading-list
- Knowing Machines Res. Project, Comment on Proposed Trade Regulation Rule on Commercial Surveillance and Data Security (Dec. 1, 2022), https://www.regulations.gov/comment/FTC-2022-0053-1142.
- Jason Schultz and Melodi Dincer, Amici Brief of Science, Legal, and Technology Scholars in Renderos et al. v. Clearview AI, Inc. et al., No. RG21096898 (Superior Ct. Alameda County) (September 19, 2022), https://ssrn.com/abstract=4238870.