Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

Foundation AI: Cisco launches AI model for integration in security applications

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Apple Says Claude, DeepSeek-R1, and o3-mini Can’t Really Reason
DeepSeek

Apple Says Claude, DeepSeek-R1, and o3-mini Can’t Really Reason

Advanced AI BotBy Advanced AI BotJune 9, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI critic Gary Marcus is smiling again, thanks to Apple. 

In a new paper titled The Illusion of Thinking, researchers from the Cupertino-based company argue that even the most advanced AI models, including the so-called large reasoning models (LRMs), don’t actually think. Instead, they simulate reasoning without truly understanding or solving complex problems.

The paper, released just ahead of Apple’s Worldwide Developer Conference, tested leading AI models, including  OpenAI’s o1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking, using specially designed algorithmic puzzle environments rather than standard benchmarks. 

The researchers argue that traditional benchmarks, like math and coding tests, are flawed due to “data contamination” and fail to reveal how these models actually “think”.

“We show that state-of-the-art LRMs still fail to develop generalisable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,” the paper noted.

Interestingly, one of the authors of the paper is Samy Bengio, the brother of Turing Award winner Yoshua Bengio. Yoshua recently launched LawZero, a Canada-based nonprofit AI safety lab working on building systems that prioritise truthfulness, safety, and ethical behaviour over commercial interests. 

The lab has secured around $30 million in initial funding from prominent backers, including former Google CEO Eric Schmidt’s philanthropic organisation, Skype co-founder Jaan Tallinn, Open Philanthropy, and the Future of Life Institute.

Backing the paper’s claims, Marcus could not hold his excitement. “AI is not hitting a wall. But LLMs probably are (or at least a point of diminishing returns). We need new approaches, and to diversify the which roads are being actively explored.”

“I don’t think LLMs are a good way to get there (AGI). They might be part of the answer, but I don’t think they are the whole answer,” Marcus said in a previous interaction with AIM, stressing that LLMs are not “useless”. He also expressed optimism about AGI, describing it as a machine capable of approaching new problems with the flexibility and resourcefulness of a smart human being. “I think we’ll see it someday,” he further said.

Taking a more balanced view, Ethan Mollick, professor at The Wharton School, said in a post on X, “I think the Apple paper on the limits of reasoning models in particular tests is useful & important, but the “LLMs are hitting a wall” narrative on X around it feels premature at best. Reminds me of the buzz over model collapse—limitations that were overcome quickly in practice.”

He added that the current approach to reasoning likely has real limitations for a variety of reasons. However, the reasoning approaches themselves were made public less than a year ago. “There are just a lot of approaches that might overcome these issues. Or they may not. It’s just very early.”

Hemanth Mohapatra, partner at Lightspeed India, said that the recent Apple paper showing reasoning struggles with complex problems confirms what many experts, like Yann LeCun, have long sensed. He acknowledged that while a new direction is necessary, current AI capabilities still promise significant productivity gains.

“We do need a different hill to climb, but that doesn’t mean existing capabilities won’t have huge impact on productivity,” he said.

Meanwhile, Subbarao Kambhampati, professor at Arizona State University, who has been pretty vocal about LLMs’ inability to reason and think, quipped that another advantage of being a university researcher in AI is, “You don’t have to deal with either the amplification or the backlash as a surrogate for ‘The Company’. Your research is just your research, fwiw.”

How the Models Were Tested

Instead of relying on familiar benchmarks, Apple’s team used controlled puzzle environments, such as variants of the Tower of Hanoi, to precisely manipulate problem complexity and observe how models generate step-by-step “reasoning traces”. This allowed them to see not just the final answer, but the process the model used to get there.

The paper found that for simpler problems, non-reasoning models often outperformed more advanced LRMs, which tended to “overthink” and miss the correct answer. 

As the difficulty level rose to moderate, the reasoning models showed their strength, successfully following more intricate logical steps. However, when faced with truly complex puzzles, all models, regardless of their architecture, struggled and ultimately failed. 

Rather than putting in more effort, the AI responses grew shorter and less thoughtful, as if the models were giving up. 

While large language models continue to struggle with complex reasoning, that doesn’t make them useless. 

Abacus.AI CEO Bindu Reddy pointed out on X, many people are misinterpreting the paper as proof that LLMs don’t work. “All this paper is saying is LLMs can’t solve arbitrarily hard problems yet,” she said, adding that they’re already handling tasks beyond the capabilities of most humans.

Why Does This Happen?

The researchers suggest that what appears to be reasoning is often just the retrieval and adaptation of memorised solution templates from training data, not genuine logical deduction. 

When confronted with unfamiliar and highly complex problems, the models’ reasoning abilities tend to collapse almost immediately, revealing that what appears to be reasoning is often just an illusion of thought. 

The study makes it clear that current large language models are still far from being true general-purpose reasoners. Their ability to handle reasoning tasks does not extend beyond a certain level of complexity, and even targeted efforts to train them with the correct algorithms result in only minor improvements.

Cover up for Siri’s failure?

Andrew White, co-founder of FutureHouse, questioned Apple’s approach, saying that its AI researchers seem to have adopted an “anti-LLM cynic ethos” by repeatedly publishing papers that argue reasoning LLMs are fundamentally limited and lack generalisation ability. He pointed out the irony, saying Apple has “the worst AI products” like Siri and Apple Intelligence, and admitted he has no idea what their actual strategy is.

What This Means for the Future

Apple’s research serves as a cautionary message for AI developers and users alike. While today’s chatbots and reasoning models appear impressive, their core abilities remain limited. As the paper puts it, “despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds.”

“We need models that can represent and manipulate abstract structures, not just predict tokens. Hybrid systems that combine LLMs with symbolic logic, memory modules, or algorithmic planners are showing early promise. These aren’t just add-ons — they reshape how the system thinks,” said Pradeep Sanyal, AI and data leader at a global tech consulting firm, in a LinkedIn post.

He further added that combining neural and symbolic parts isn’t without drawbacks. It introduces added complexity around coordination, latency, and debugging. But the improvements in precision and transparency make it a direction worth exploring.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleChina Temporarily Shuts Down AI Apps to Stop Cheating During National Exams
Next Article Europe’s bold new tech contender « Euro Weekly News
Advanced AI Bot
  • Website

Related Posts

DeepSeek helps lift colliery into smart mining benchmark

June 9, 2025

Apple Says Claude, DeepSeek-R1, and o3-mini Can’t Really Reason

June 9, 2025

How Deepseek AI is Accelerating Scientific Breakthroughs

June 9, 2025
Leave A Reply Cancel Reply

Latest Posts

Institute of Museum and Library Services Cuts Allowed as Lawsuits Proceed

‘Dissident-Right Art Hos’ US Pavilion Pitched for Venice Biennale

Timeless Luxury Menswear Accessories For Summer 2025

Steve Hackett On Genesis’ ‘The Lamb Lies Down On Broadway’ Ahead Of New 50th Anniversary Box

Latest Posts

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

June 9, 2025

United States, China, and United Kingdom Lead the Global AI Ranking According to Stanford HAI’s Global AI Vibrancy Tool

June 9, 2025

Foundation AI: Cisco launches AI model for integration in security applications

June 9, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.