What It Takes To Sue OpenAI As A Journalism Nonprofit

Last June, the Center for Investigative Reporting (CIR) became just the second nonprofit news organization in the U.S. to file a copyright suit against a major AI company. CIR, the publisher of both Mother Jones and Reveal, alleged that OpenAI and Microsoft used its copyright-protected stories to train generative AI products, including ChatGPT.

A year later, CIR’s suit has been consolidated into the most closely watched copyright case in the publishing industry. CIR’s claims are being tried in federal court alongside a dozen other high-profile plaintiffs including The New York Times, several daily papers owned by the hedge fund Alden Global Capital, and a class action composed of high-profile book authors. Apart from The Intercept, which is also part of the consolidated case, CIR remains the only nonprofit journalism organization in the U.S. to take OpenAI to court.

introducing chatgpt appears on computer screen

While CIR and The Intercept are attempting to carve out legal precedent, other industry-leading nonprofits have instead chosen to sign licensing deals with AI companies or to receive indirect funding via AI innovation programs.

“We wanted to be at the forefront of this litigation in part because so many others can’t do it,” said Monika Bauerlein, the CEO of the Center for Investigative Reporting. She noted that many cash-strapped nonprofits don’t have in-house general counsel or the resources to take on years-long legal action. CIR had both, and the will to defend the value of its original reporting. “We cannot have journalism once again be this free resource that is mined for an incredible amount of profit by a tech company, and wait for some handout at the pleasure of those companies.”

Many for-profit publishers have centered how products like ChatGPT are siphoning off search traffic and eroding advertising models. Bauerlein frames the crisis for nonprofit newsrooms through a slightly different lens, since many are not as reliant on conventional ads. “It’s not so much about the raw traffic as it is about the relationship,” she said. “The relationship between an audience and a journalist or a newsroom is broken when these models use content without permission or attribution.”

CIR’s suit is currently moving through the discovery phase, during which OpenAI and publishers are collecting and exchanging information to use at trial. The Southern District of New York (SDNY) is expected to announce a trial date soon. As the June 27 anniversary of the lawsuit filing approaches, I spoke to CIR and its legal team about how they’ve weathered this first year of litigation and why so few nonprofit publishers have followed in their footsteps.

The burden of discovery

CIR’s copyright claims don’t stray far from other suits currently winding their way through U.S. courts. The organization’s case is two-pronged. As publisher of the magazine Mother Jones, CIR is a copyright registration holder, meaning it has registered these print stories in bulk with the U.S. Copyright Office. CIR is using these registrations to claim traditional copyright infringement by OpenAI, a legal avenue that is not available to most digital-only publishers.

For its own digital works, including stories published on Revealnews.org and on Motherjones.com, CIR is also claiming that OpenAI violated the Digital Millennium Copyright Act (DMCA). This claim hinges on whether OpenAI removed copyright-related information when it fed CIR stories into its training data. That includes removing author names, headlines, and terms of use from articles. While the DMCA route is the most viable for digital publishers, it has seen a few setbacks recently after similar claims against OpenAI were dismissed outright or narrowed in scope.

In its original June 2024 filing, CIR cited the internal OpenAI data set WebText as evidence for these claims. The data set was created by scraping outbound links from Reddit; those links were then used to train GPT-2, an early version of the model that now powers ChatGPT. OpenAI itself published a list of the 1,000 most-used web domains in WebText back in 2019. Mother Jones came in at number 267 on the list, with 16,793 distinct URLs included from its website.

In OpenAI’s own methodology paper, researchers said they used algorithms called “Dragnet” and “Newspaper” to build WebText, both of which were made to extract the main content from a website page, leaving behind footers and copyright notices.

Loevy & Loevy, a Chicago-based civil rights law firm with a long track record of suing for public records on behalf of journalists, is representing CIR. In the past couple years, Matt Topic, the lead lawyer on the case, has established a practice at the firm representing independent outlets taking on AI goliaths. He is representing The Intercept in the SDNY case, as well as the progressive news sites Raw Story and AlterNet.

One of the biggest reasons CIR has been able to afford this litigation is that Loevy & Loevy is representing the nonprofit on a contingency basis. That means the firm will only get paid if they win or settle the case. Even so, the case has been a resource drain.

“The costs to us are primarily in time,” said Bauerlein, explaining that CIR’s in-house counsel has been working tirelessly, with periodic support from several other employees on staff. Since the case moved into discovery, the workload has only grown. OpenAI has requested internal policy documents, lines of website code, and employment records going back years, alongside depositions with executives like Bauerlein. “My experience has been that deep-pocketed litigants will try to exhaust you at every step of the litigation,” she said.

During discovery CIR was required to appoint custodians — designated employees who must hold onto electronic or hard-copy files containing potential evidence. As a percentage of its total staff, CIR already has five times as many custodians as OpenAI (2.5% of employees as opposed to 0.5% of employees), but in court OpenAI demanded that CIR appoint still more custodians.

“Few nonprofit newsrooms have sued AI companies because out of the hundreds of them around the country, only a handful have in-house counsel,” said Victoria Baranetsky, general counsel for CIR. “Most attorneys at nonprofits already have their plates full with all of the legal issues that a newsroom faces these days.” Under the new Trump administration alone, news publishers have seen a spike in libel litigation and an increase in FOIA challenges.

Discovery for these reasons is one of the biggest deterrents for nonprofits to file suits against AI companies. “I represent lots of newsrooms, but for smaller nonprofit and independent entities, I would imagine there’s some feeling of intimidation about what the discovery process involves,” said Topic.

Overall, the most contentious part of discovery in the CIR case so far has revolved around OpenAI’s resistance to saving ChatGPT user data.

Last month, Judge Sidney Stein ordered that OpenAI must retain user conversations so that ChatGPT’s outputs can be fully audited. OpenAI has vigorously opposed the order, filing multiple appeals and taking its pleas outside the courtroom. In a June 5 blog post, the company publicly criticized the request, writing that it “fundamentally conflicts with the privacy commitments we have made to our users.” OpenAI CEO Sam Altman followed up with a post on X.

we have been thinking recently about the need for something like “AI privilege”; this really accelerates the need to have the conversation.

imo talking to an AI should be like talking to a lawyer or a doctor.

i hope society will figure this out soon.

— Sam Altman (@sama) June 6, 2025

Data from businesses paying for ChatGPT Enterprise is not subject to this court order, and any data collected would be in a “legal hold” accessible only to a small audited team. Oral arguments on the issue are scheduled to take place on June 26.

“Altman believes there should be a privilege akin to a doctor-patient privilege or an attorney-client privilege for humans and robots. The law does not recognize such a privilege,” said Topic.

“We’ve seen this movie before”

Legal resources alone can’t explain why so few nonprofit publishers have chosen to sue AI companies. Topic says small, nonprofit outlets are likely taking a “wait and see” approach — biding their time with hopes that early rulings in federal courts will give some indication which way the winds are blowing for copyright owners.

“There will come a time in the not-too-distant future where there could be statute of limitations issues,” said Topic. “That time may come before the current lawsuits have been resolved — so wait-and-see may not work indefinitely.” Generally speaking, the statute of limitations for filing an infringement claim under the U.S. Copyright Act is three years, but it’s still not clear how restrictions like this will impact future AI-related litigation.

In the meantime, many of CIR’s peers in nonprofit journalism have opted to enter into direct and indirect partnerships with OpenAI and other major AI companies. The Associated Press — a nonprofit, albeit a much larger one than those involved in the SDNY case — was the first news organization to sign a licensing deal with OpenAI, back in the summer of 2023. In January, it also became the first to sign a deal with Google for its Gemini chatbot.

Last summer, The Texas Tribune became the first local news nonprofit to sign a licensing deal with a major AI company, joining Perplexity’s revenue-sharing program.

Many news nonprofits receive indirect funding from OpenAI through AI innovation programs. The American Journalism Project’s Product and AI Studio launched with a $5 million donation from OpenAI and counts several nonprofits among its current cohort, including The Marshall Project, The City, Chalkbeat, and Sahan Journal. The Lenfest Institute’s AI and local news collaborative, meanwhile, has $2.5 million in funding from OpenAI and Microsoft. Current participants include nonprofit big city dailies like The Baltimore Banner, The Philadelphia Inquirer, and The Chicago Sun-Times, as well as the investigative outlet ProPublica.

“We’ve seen this movie before,” said Bauerlein, calling back to the emergence of search engines in the early 2000s (namely, Google) and more recently, social media platforms (namely, Facebook). “Historically, when tech companies are under pressure, they hand out crumbs from the table in the form of charity to both improve their public image and to put particularly nonprofit organizations into a quandary.” These relatively small donations put under-resourced newsrooms in a bind. “[Nonprofits] have an opportunity to get some support right now or bet on a litigation strategy that will be expensive and take time,” she said.

Still, CIR would consider its own deal with an AI company if the terms were right and the company were “in sync with our mission,” said Bauerlein. (Though no AI company has approached CIR since the nonprofit filed its suit against OpenAI.)

For now, Bauerlein said the harms posed to nonprofit journalism by generative AI are too great, and too immediate, to wait for an invitation. “It is critical to not let journalism be chewed up and spit out by technologies yet again,” she said. “This is not something that has a ten- or even five-year time horizon. This needs to be addressed now.”

Source link

What's Hot

What to expect from free Perplexity AI Comet Browser: Enhanced multitasking?

TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis – Takara TLDR

The Lean AI Lab’s Blueprint for Superhuman Productivity

What it takes to sue OpenAI as a journalism nonprofit

Samsung Electronics, SK Hynix Shares Soar On OpenAI’s Korean Data Center Push

No More Pikachu Oppenheimer? OpenAI Promises Rightsholders More Control Over Sora Creations

OpenAI appears to be walking back its Sora copyright policy

Former ARTnews Publisher Dies at 97

National Gallery of Art Closes as a Result of Government Shutdown

Almine Rech Closes London Gallery After More Than a Decade

Record Exec and Art Collector Gets Over 4 Years