OpenAI Inc. is casting a spotlight on a longstanding legal debate over how to balance a court’s need for information against protecting individuals’ personal data.
The AI giant has been challenging a court order to preserve its ChatGPT outputs in a precedent-setting copyright case, claiming the “sweeping, unprecedented order” threatens hundreds of millions of its users’ privacy rights.
OpenAI’s case is the first of its kind to reach discovery stage, setting up a test of how courts will weigh novel legal questions against protecting the privacy and security of overwhelming amounts of personal information at risk of being shared in this emerging era of AI-related litigation. The discovery dispute also serves as a warning for other technology companies like Google or Anthropic about strong data governance—and how to prepare to hand over troves of information increasingly sought by courts and plaintiffs.
“It’s a vivid illustration of the new legal and operational challenges posed by AI in the context of discovery, privacy, and proportionality,” said Bobby Malhotra, chair of Winston & Strawn LLP’s e-discovery and information governance practice.
Changing the Rules?
Tensions between discovery and data privacy have been bubbling ever since evidence moved from physical boxes of paper to digital files.
As part of discovery, parties can seek any evidence that is relevant and proportional to the needs of a case. The proportionality test determines the scope of discovery based on the importance of the issues at stake, the dollar amount in controversy, parties’ resources, and whether the burden of the proposed discovery outweighs its benefits.
OpenAI’s privacy violation argument comes on the heels of more recent efforts to amend rules governing civil litigation to better protect the personal information of individuals.
Lawyers for Civil Justice, an advocacy group made up of corporations, law firms, and defense bar organizations has been submitting suggestions to the Advisory Committee on Civil Rules to address the “increasingly intense conflict between discovery demands and privacy rights” for several years.
The Federal Rules of Civil Procedure, which govern civil litigation, don’t sufficiently address obligations and risks of handling personal information, the group said in its latest proposal in March. The proportionality test is incomplete unless “courts and parties also consider the risk of harm caused by infringing privacy rights or exposing sensitive information to cyber security threats,” it wrote.
The group’s first comprehensive suggestions in 2023 were met with recommendations for a more targeted and “discrete” approach. The Advisory Committee hasn’t reviewed the new proposal yet.
Despite the backdrop behind OpenAI’s claims of privacy violations, e-discovery professionals are doubtful the copyright case in itself—or the spurt of AI—will be enough to prompt a rule change.
“The proportionality analysis is designed to balance cost and benefits in very much an economic sense,” said James C. Francis, former magistrate judge in the US District Court for the Southern District of New York, now a mediator at arbitration firm JAMS.
Trying to fit privacy into that equation would mean “asking how much is this privacy right worth,” he added. “And that’s a very odd question.”
The magistrate judge behind the order has so far been unconvinced by OpenAI’s privacy argument, repeatedly stating that the order would be temporary and that user data would be kept private.
Data Governance
This case will now test how courts apply principles like proportionality to a new wave of data-hungry technology: large language models.
While data volumes have been increasing for years, AI tools are “generating data at unprecedented volumes,” Winston & Strawn’s Malhotra said. “We’re seeing this in all facets of discovery, but it is something that is really being brought to the spotlight because of AI,” he added.
Tensions with companies’ privacy promises to consumers and compliance with regulatory frameworks will likely continue to exacerbate. In the absence of rule updates, companies will have to explore alternative ways to mitigate privacy and cybersecurity concerns.
“Is there a way to address all of this under the existing prongs of relevance and proportionality?” said Jayashree Mitra, who represents clients in commercial disputes at Carlton Fields.
Some of the fears could be addressed through negotiations between parties about narrowing the review of the data once it’s preserved, she said, such as by limiting the personnel allowed to access the data. Some of the data could also be reviewed in place, therefore limiting opportunities for breaches or unintentional disclosures.
The May 13 preservation order directed OpenAI to preserve all output log data that would otherwise be deleted, “whether such data might be deleted at a user’s request or because of ‘numerous privacy laws and regulations.’”
The tech giant balked at the order, calling it “unduly burdensome” given its 300 million weekly active users. The obligations would require “months of engineering time” and financial resources, OpenAI said.
“Not only the OpenAI engineers, but the Anthropic team, the Gemini team for Google, every other major model provider that has learned about this order has probably asked their engineering team, if we were ordered to do this, how would we go about it?,” said Jeffrey M. Kelly, Nelson Mullins partner who’s advised clients on issues related to AI and e-discovery.
OpenAI and other tech companies should be prepared to preserve swaths of data—including those they might have promised to delete—as AI-related litigation is poised to continue.
Despite potential technical and operational challenges, businesses will still be responsible for building strong data governance processes. This means understanding where user data is located and how long it’s being retained.
“Preserving the data itself is something that is not that controversial,” Mitra said, “because you should be preserving the data given the nature of the litigation.”
This case is The New York Times Company v. Microsoft Corp., S.D.N.Y., No. 1:23-cv-11195.