DeepSeek AI Accused Of Training On Google Gemini Outputs Amid Data Contamination Concerns

Chinese AI lab DeepSeek is under renewed scrutiny following the release of its updated R1 model, with researchers suggesting it may have been trained on outputs from Google’s Gemini models.

Developer Sam Paech pointed to linguistic similarities between DeepSeek’s R1-0528 and Gemini 2.5 Pro, claiming in a post on X that the model’s phrasing patterns suggest a switch from OpenAI-based to Gemini-generated synthetic data. Another developer behind the SpeechMap evaluation tool said DeepSeek’s internal “traces” resemble those of Gemini.

This isn’t the first time DeepSeek has faced such allegations. In December, its V3 model appeared to misidentify itself as ChatGPT. OpenAI previously told the Financial Times that it had linked DeepSeek to data scraping via distillation – training a model on the outputs of more advanced ones. Microsoft reportedly detected suspicious data exfiltration from OpenAI-linked developer accounts in late 2024.

While model similarities don’t prove misuse – many AIs echo common phrasing due to web content saturation – experts say the risk of “AI slop” in training data is growing. As a countermeasure, OpenAI and others have begun limiting API access and summarizing model traces to hinder unauthorized distillation.

“DeepSeek is short on GPUs and flush with cash,” said AI2 researcher Nathan Lambert. “Using synthetic data from top-tier models would be a logical shortcut.”

Source link

What's Hot

AI) Just Reported And Analysts Are Trimming Their Forecasts

Automate advanced agentic RAG pipeline with Amazon SageMaker AI

‘Big leap forward’: How AI is already shaping your hurricane forecasts

DeepSeek AI Accused of Training on Google Gemini Outputs Amid Data Contamination Concerns

Weekly Lecture Preview | Exploring DeepSeek and Library Applications_skills_The_coming

DeepSeek-R1 More Effective in Diagnosis, Management of Ophthalmic Subspecialties Compared With OpenAI

HPV-DeepSeek shows potential for early detection of head and neck cancer

Nicholas Galanin Pulls Out of Smithsonian Event, Claiming Censorship

Two More Staffers Fired from Kennedy Center after Trump Takeover

Long-Lost Painting By Rubens From 1613 Discovered in Paris Mansion

Ken Griffin Loves Pollock’s Blue Poles So Much He Tried to Buy it

AI) Just Reported And Analysts Are Trimming Their Forecasts

Automate advanced agentic RAG pipeline with Amazon SageMaker AI

‘Big leap forward’: How AI is already shaping your hurricane forecasts

What's Hot

DeepSeek AI Accused of Training on Google Gemini Outputs Amid Data Contamination Concerns

Related Posts

Subscribe to Updates