DeepSeek AI Accused Of Training On Google Gemini Outputs Amid Data Contamination Concerns

Chinese AI lab DeepSeek is under renewed scrutiny following the release of its updated R1 model, with researchers suggesting it may have been trained on outputs from Google’s Gemini models.

Developer Sam Paech pointed to linguistic similarities between DeepSeek’s R1-0528 and Gemini 2.5 Pro, claiming in a post on X that the model’s phrasing patterns suggest a switch from OpenAI-based to Gemini-generated synthetic data. Another developer behind the SpeechMap evaluation tool said DeepSeek’s internal “traces” resemble those of Gemini.

This isn’t the first time DeepSeek has faced such allegations. In December, its V3 model appeared to misidentify itself as ChatGPT. OpenAI previously told the Financial Times that it had linked DeepSeek to data scraping via distillation – training a model on the outputs of more advanced ones. Microsoft reportedly detected suspicious data exfiltration from OpenAI-linked developer accounts in late 2024.

While model similarities don’t prove misuse – many AIs echo common phrasing due to web content saturation – experts say the risk of “AI slop” in training data is growing. As a countermeasure, OpenAI and others have begun limiting API access and summarizing model traces to hinder unauthorized distillation.

“DeepSeek is short on GPUs and flush with cash,” said AI2 researcher Nathan Lambert. “Using synthetic data from top-tier models would be a logical shortcut.”

Source link

What's Hot

What’s Happening With IBM Stock?

Putting AI To Work To Stymie The Email Fraudsters And Crooks

Why Big Investors Are All Ears For Voice AI Startups

DeepSeek AI Accused of Training on Google Gemini Outputs Amid Data Contamination Concerns

UAE Lab Releases Open-Source Model to Rival China’s DeepSeek

Baidu updates AI reasoning model to rival systems from DeepSeek, OpenAI, Google

China’s DeepSeek Predicts XRP, Ethereum, Pi Prices by 2025

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

What’s Happening With IBM Stock?

Putting AI To Work To Stymie The Email Fraudsters And Crooks

Why Big Investors Are All Ears For Voice AI Startups

What's Hot

DeepSeek AI Accused of Training on Google Gemini Outputs Amid Data Contamination Concerns

Related Posts

Subscribe to Updates