When you use GenAI products, you generate data that companies can use to further enhance their products. This is a longstanding practice with most tech products – tech companies often design their products to maximize the amount and variety of your data that they can collect because the business insights these data can produce are incredibly valuable. The data broker industry that trades in user data is valued at roughly $260 billion.
This industry-wide thirst for your data has created the “Big Data” phenomenon of the past decade. In turn, this Big Data forms the core of GenAI: GenAI products are the latest application of advanced machine learning techniques for processing vast amounts of data to train algorithms to produce novel outputs. This means that data privacy issues with GenAI products aren’t new and evoke similar issues with earlier machine learning technologies. They compound past failures to pass meaningful data regulations that could have protected your personal data long before GenAI products existed.
But what does “your data” mean in this context? There are a lot of different types of information about you that GenAI products collect and process when you interact with them, and the specifics depend on which product you are using. Generally, there are two buckets of “user data”:
Because we lack comprehensive federal data protection laws in the U.S. that constrain companies’ ability to collect your data, companies independently decide how much of your data they will collect and for which purposes. We rely on them to disclose their user data practices in privacy policies that can be ambiguous and even contradictory, but that also give companies legal cover when users “consent” to privacy policies in order to access and use GenAI products.
There are a handful of recent lawsuits challenging these companies’ ability to collect data while you use GenAI products, however. P.M. v. OpenAI and A.T. v. OpenAI, for example, both accuse OpenAI of violating wiretapping and anti-hacking laws. They argue that OpenAI effectively hacks users' platform access, exceeding users' authorized access for OpenAI when using ChatGPT, and intercepts their private information (i.e., user data) without their knowledge or consent.
While plaintiffs claim not to have notice from third party websites that have integrated ChatGPT plugins, these websites generally have their own privacy policies that explicitly describe the companies’ ability to share user data with “business partners” and “service providers,” that would presumably include OpenAI.
Biometric data might be harder for companies to collect without complying with biometric privacy laws. In Flora v. Prisma Labs and P.M. v. OpenAI, the user-plaintiffs allege both companies violated several requirements under the Illinois Biometric Privacy Act (BIPA) that protect Illinois residents’ sensitive biometric data, specifically their facial image data in these cases.
Biometric data can include information about what you look like (i.e., what your face looks like, your irises, your thumbprints or palmprints, etc.), genetic data (which some laws include under the “biometric data” umbrella), as well as other types of personally identifiable information (PII) (under which some laws include biometric and genetic data). These are highly sensitive forms of data that many federal and state laws regulate in a variety of ways, because once your unique biometric data is compromised, it is nearly impossible to fix that breach (at least without undergoing extreme physical or genetic adjustments).
BIPA is a prime example of such regulations. It requires companies that collect biometric data to comply with several procedures before, during, and after collecting users’ biometric data. Companies must: (1) have a written policy establishing a schedule for destroying biometric information within a set timeframe (§ 15(a)); (2) first inform users in writing of data collection, the specific purpose and length of the collection, and receive a written release from users before they “collect, capture, purchase, receive through trade, or otherwise obtain” biometric data (§ 15(b)); (3) refrain from selling, leasing, trading, or profiting from users’ biometric data (§ 15(c)); (4) refrain from disclosing or disseminating user biometric data without user consent, among other requirements (§ 15(d)); (5) and must store, transmit, and protect biometric data from disclosure using “the reasonable standard of care” within that company’s industry or in a way that is consistent with how the company treats other sensitive data it possesses (§ 15(e)). Plaintiffs need only to show that a company has failed to meet any of these requirements to make a strong BIPA claim.
The BIPA claims in these two cases are important because BIPA has proven to be an effective, roundabout way to chastise companies for their invasive biometric data collection practices. Several powerful tech companies have chosen to settle BIPA lawsuits alleging they similarly collected and relied on users’ biometric data without following BIPA’s procedures: ACLU settled with Clearview AI over its use of facial biometric data; Facebook settled over its scanning of facial geometry in users’ photos; Google also settled a similar suit, as did Snap (the company that owns SnapChat), and TikTok most recently. These were all multimillion dollar settlements, except for the Clearview settlement which severely restricted Clearview’s U.S. client-base to law enforcement agents only.
In P.M., a subset of Illinois-based plaintiffs also allege that OpenAI violated BIPA’s procedural requirements. They claim that OpenAI collected and relied on their facial images from scraped photos from the internet to train image diffusion products like DALL-E, which can generate realistic depictions of human (and human-like) faces. Like Prisma, the plaintiffs allege that OpenAI did not have a public written policy concerning their use of facial data, did not receive a written release to collect and use this data from their images, and did not comply with several other BIPA requirements in developing and publishing DALL-E. If the case moves forward into discovery, OpenAI may have to finally divulge the internet sources of its training data beyond what has been uncovered by others.