OpenAI Tightens Access As Evidence Mounts Of AI Model Mimicry

In a bid to protect its crown jewels, OpenAI is now requiring government ID verification for developers who want access to its most advanced AI models.

While the move is officially about curbing misuse, a deeper concern is emerging: that OpenAI’s own outputs are being harvested to train competing AI systems.

A new research paper from Copyleaks, a company that specializes in AI content detection, offers evidence of why OpenAI may be acting now. Using a system that identifies the stylistic “fingerprints” of major AI models, Copyleaks estimated that 74% of the outputs from rival Chinese model, DeepSeek-R1, were classified as OpenAI-written.

This doesn’t just suggest overlap — it implies imitation.

Copyleaks’s classifier was also tested on other models including Microsoft’s phi-4 and Elon Musk’s Grok-1. These models scored almost zero similarity to OpenAI — 99.3% and 100% “no-agreement” respectively — indicating independent training. Mistral’s Mixtral model has some similarities, but DeepSeek’s numbers stood out starkly.

A chart showing stylistic “fingerprint” similarities to OpenAI models

Copyleaks research

The research underscores how even when models are prompted to write in different tones or formats, they still leave behind detectable stylistic signatures — like linguistic fingerprints. These fingerprints persist across tasks, topics, and prompts, and can now be traced back to their source with some accuracy. That has enormous implications for detecting unauthorized model use, enforcing licensing agreements, and protecting intellectual property.

OpenAI didn’t respond to requests for comment. But the company discussed some reasons why it introduced the new verification process. “Unfortunately, a small minority of developers intentionally use the OpenAI APIs in violation of our usage policies,” it wrote when announcing the change recently.

OpenAI says DeepSeek might have ‘inappropriately distilled’ its models

Earlier this year, just after DeepSeek wowed the AI community with reasoning models that were similar in performance to OpenAI’s offerings, the US startup was even clearer: “We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models.”

Distillation is a process where developers train new models using the outputs of other existing models. While such a technique is common in AI research, doing so without permission could violate OpenAI’s terms of service.

DeepSeek’s research paper about its new R1 model describes using distillation with open-source models, but it doesn’t mention OpenAI. I asked DeepSeek about these allegations of mimicry earlier this year and didn’t get a response.

Critics point out that OpenAI itself built its early models by scraping the web, including content from news publishers, authors, and creators — often without consent. So is it hypocritical for OpenAI to complain when others use its outputs in a similar way?

“It really comes down to consent and transparency,” said Alon Yamin, CEO of Copyleaks.

Training on copyrighted human content without permission is one kind of issue. But using the outputs of proprietary AI systems to train competing models is another — it’s more like reverse-engineering someone else’s product, he explained.

Yamin argues that while both practices are ethically fraught, training on OpenAI outputs raises competitive risks, as it essentially transfers hard-earned innovations without the original developer’s knowledge or compensation.

As AI companies race to build ever-more capable models, this debate over who owns what — and who can train on whom — is intensifying. Tools like Copyleaks’ digital fingerprinting system offer a potential way to trace and verify authorship at the model level. For OpenAI and its rivals, that may be both a blessing and a warning.

Source link

What's Hot

Google’s AI Mode arrives in Spanish globally

France’s Mistral AI plans expansion into Canada

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models – Takara TLDR

OpenAI Tightens Access As Evidence Mounts of AI Model Mimicry

Will OpenAI Really Build 60 Football Fields Worth of AI Infrastructure Per Week?

Nvidia says all customers will be ‘priority’ despite OpenAI deal – East Bay Times

Nvidia Invests in OpenAI With $100 Billion to Build Out More AI Data Centers

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy