OpenAI Tightens Access As Evidence Mounts Of AI Model Mimicry

In a bid to protect its crown jewels, OpenAI is now requiring government ID verification for developers who want access to its most advanced AI models.

While the move is officially about curbing misuse, a deeper concern is emerging: that OpenAI’s own outputs are being harvested to train competing AI systems.

A new research paper from Copyleaks, a company that specializes in AI content detection, offers evidence of why OpenAI may be acting now. Using a system that identifies the stylistic “fingerprints” of major AI models, Copyleaks estimated that 74% of the outputs from rival Chinese model, DeepSeek-R1, were classified as OpenAI-written.

This doesn’t just suggest overlap — it implies imitation.

Copyleaks’s classifier was also tested on other models including Microsoft’s phi-4 and Elon Musk’s Grok-1. These models scored almost zero similarity to OpenAI — 99.3% and 100% “no-agreement” respectively — indicating independent training. Mistral’s Mixtral model has some similarities, but DeepSeek’s numbers stood out starkly.

A chart showing stylistic “fingerprint” similarities to OpenAI models

Copyleaks research

The research underscores how even when models are prompted to write in different tones or formats, they still leave behind detectable stylistic signatures — like linguistic fingerprints. These fingerprints persist across tasks, topics, and prompts, and can now be traced back to their source with some accuracy. That has enormous implications for detecting unauthorized model use, enforcing licensing agreements, and protecting intellectual property.

OpenAI didn’t respond to requests for comment. But the company discussed some reasons why it introduced the new verification process. “Unfortunately, a small minority of developers intentionally use the OpenAI APIs in violation of our usage policies,” it wrote when announcing the change recently.

OpenAI says DeepSeek might have ‘inappropriately distilled’ its models

Earlier this year, just after DeepSeek wowed the AI community with reasoning models that were similar in performance to OpenAI’s offerings, the US startup was even clearer: “We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models.”

Distillation is a process where developers train new models using the outputs of other existing models. While such a technique is common in AI research, doing so without permission could violate OpenAI’s terms of service.

DeepSeek’s research paper about its new R1 model describes using distillation with open-source models, but it doesn’t mention OpenAI. I asked DeepSeek about these allegations of mimicry earlier this year and didn’t get a response.

Critics point out that OpenAI itself built its early models by scraping the web, including content from news publishers, authors, and creators — often without consent. So is it hypocritical for OpenAI to complain when others use its outputs in a similar way?

“It really comes down to consent and transparency,” said Alon Yamin, CEO of Copyleaks.

Training on copyrighted human content without permission is one kind of issue. But using the outputs of proprietary AI systems to train competing models is another — it’s more like reverse-engineering someone else’s product, he explained.

Yamin argues that while both practices are ethically fraught, training on OpenAI outputs raises competitive risks, as it essentially transfers hard-earned innovations without the original developer’s knowledge or compensation.

As AI companies race to build ever-more capable models, this debate over who owns what — and who can train on whom — is intensifying. Tools like Copyleaks’ digital fingerprinting system offer a potential way to trace and verify authorship at the model level. For OpenAI and its rivals, that may be both a blessing and a warning.

Source link

What's Hot

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation – Takara TLDR

China issues port crackdown on all Nvidia AI chip imports, says report — enforcement teams deployed to quash smuggling and investigate data center hardware, targeting H20 and RTX 6000D shipments

MIT rejects Trump compact, first to stand up to partisan demands

OpenAI Tightens Access As Evidence Mounts of AI Model Mimicry

Hollywood-AI battle heats up, as OpenAI and studios clash over copyrights and consent

Mark Cuban Joins OpenAI’s Sora — and Lets Fans Make AI Videos of Him

OpenAI Lets You Sora For Free, Gorkipedia, Signs Of AI Bubble

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues