Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

How E2B became essential to 88% of Fortune 100 companies and raised $21 million

The first look: Disrupt 2025 AI Stage revealed

Rise of AI deepfakes and fraudulent candidates are changing how TA recruits

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
DeepSeek

Apple Says Claude, DeepSeek-R1, and o3-mini Can’t Really Reason

By Advanced AI EditorJune 9, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI critic Gary Marcus is smiling again, thanks to Apple. 

In a new paper titled The Illusion of Thinking, researchers from the Cupertino-based company argue that even the most advanced AI models, including the so-called large reasoning models (LRMs), don’t actually think. Instead, they simulate reasoning without truly understanding or solving complex problems.

The paper, released just ahead of Apple’s Worldwide Developer Conference, tested leading AI models, including  OpenAI’s o1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking, using specially designed algorithmic puzzle environments rather than standard benchmarks. 

The researchers argue that traditional benchmarks, like math and coding tests, are flawed due to “data contamination” and fail to reveal how these models actually “think”.

“We show that state-of-the-art LRMs still fail to develop generalisable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,” the paper noted.

Interestingly, one of the authors of the paper is Samy Bengio, the brother of Turing Award winner Yoshua Bengio. Yoshua recently launched LawZero, a Canada-based nonprofit AI safety lab working on building systems that prioritise truthfulness, safety, and ethical behaviour over commercial interests. 

The lab has secured around $30 million in initial funding from prominent backers, including former Google CEO Eric Schmidt’s philanthropic organisation, Skype co-founder Jaan Tallinn, Open Philanthropy, and the Future of Life Institute.

Backing the paper’s claims, Marcus could not hold his excitement. “AI is not hitting a wall. But LLMs probably are (or at least a point of diminishing returns). We need new approaches, and to diversify the which roads are being actively explored.”

“I don’t think LLMs are a good way to get there (AGI). They might be part of the answer, but I don’t think they are the whole answer,” Marcus said in a previous interaction with AIM, stressing that LLMs are not “useless”. He also expressed optimism about AGI, describing it as a machine capable of approaching new problems with the flexibility and resourcefulness of a smart human being. “I think we’ll see it someday,” he further said.

Taking a more balanced view, Ethan Mollick, professor at The Wharton School, said in a post on X, “I think the Apple paper on the limits of reasoning models in particular tests is useful & important, but the “LLMs are hitting a wall” narrative on X around it feels premature at best. Reminds me of the buzz over model collapse—limitations that were overcome quickly in practice.”

He added that the current approach to reasoning likely has real limitations for a variety of reasons. However, the reasoning approaches themselves were made public less than a year ago. “There are just a lot of approaches that might overcome these issues. Or they may not. It’s just very early.”

Hemanth Mohapatra, partner at Lightspeed India, said that the recent Apple paper showing reasoning struggles with complex problems confirms what many experts, like Yann LeCun, have long sensed. He acknowledged that while a new direction is necessary, current AI capabilities still promise significant productivity gains.

“We do need a different hill to climb, but that doesn’t mean existing capabilities won’t have huge impact on productivity,” he said.

Meanwhile, Subbarao Kambhampati, professor at Arizona State University, who has been pretty vocal about LLMs’ inability to reason and think, quipped that another advantage of being a university researcher in AI is, “You don’t have to deal with either the amplification or the backlash as a surrogate for ‘The Company’. Your research is just your research, fwiw.”

How the Models Were Tested

Instead of relying on familiar benchmarks, Apple’s team used controlled puzzle environments, such as variants of the Tower of Hanoi, to precisely manipulate problem complexity and observe how models generate step-by-step “reasoning traces”. This allowed them to see not just the final answer, but the process the model used to get there.

The paper found that for simpler problems, non-reasoning models often outperformed more advanced LRMs, which tended to “overthink” and miss the correct answer. 

As the difficulty level rose to moderate, the reasoning models showed their strength, successfully following more intricate logical steps. However, when faced with truly complex puzzles, all models, regardless of their architecture, struggled and ultimately failed. 

Rather than putting in more effort, the AI responses grew shorter and less thoughtful, as if the models were giving up. 

While large language models continue to struggle with complex reasoning, that doesn’t make them useless. 

Abacus.AI CEO Bindu Reddy pointed out on X, many people are misinterpreting the paper as proof that LLMs don’t work. “All this paper is saying is LLMs can’t solve arbitrarily hard problems yet,” she said, adding that they’re already handling tasks beyond the capabilities of most humans.

Why Does This Happen?

The researchers suggest that what appears to be reasoning is often just the retrieval and adaptation of memorised solution templates from training data, not genuine logical deduction. 

When confronted with unfamiliar and highly complex problems, the models’ reasoning abilities tend to collapse almost immediately, revealing that what appears to be reasoning is often just an illusion of thought. 

The study makes it clear that current large language models are still far from being true general-purpose reasoners. Their ability to handle reasoning tasks does not extend beyond a certain level of complexity, and even targeted efforts to train them with the correct algorithms result in only minor improvements.

Cover up for Siri’s failure?

Andrew White, co-founder of FutureHouse, questioned Apple’s approach, saying that its AI researchers seem to have adopted an “anti-LLM cynic ethos” by repeatedly publishing papers that argue reasoning LLMs are fundamentally limited and lack generalisation ability. He pointed out the irony, saying Apple has “the worst AI products” like Siri and Apple Intelligence, and admitted he has no idea what their actual strategy is.

What This Means for the Future

Apple’s research serves as a cautionary message for AI developers and users alike. While today’s chatbots and reasoning models appear impressive, their core abilities remain limited. As the paper puts it, “despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds.”

“We need models that can represent and manipulate abstract structures, not just predict tokens. Hybrid systems that combine LLMs with symbolic logic, memory modules, or algorithmic planners are showing early promise. These aren’t just add-ons — they reshape how the system thinks,” said Pradeep Sanyal, AI and data leader at a global tech consulting firm, in a LinkedIn post.

He further added that combining neural and symbolic parts isn’t without drawbacks. It introduces added complexity around coordination, latency, and debugging. But the improvements in precision and transparency make it a direction worth exploring.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleHow Qwen 3 is Transforming Search with AI-Powered Precision
Next Article Europe’s bold new tech contender « Euro Weekly News
Advanced AI Editor
  • Website

Related Posts

The DeepSeek moment: China’s economic threat

July 28, 2025

China is turning AI into a commodity – Charles Ormond

July 28, 2025

From Imitator to Innovator: China’s Cool Factor Revolutionizes Global Tech and Business

July 28, 2025
Leave A Reply

Latest Posts

Scottish Museum Group Warns of ‘Policing of Gender’—and More Art News

David Geffen Sued By Estranged Husband for Breach of Contract

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Anish Kapoor Lists New York Apartment for $17.75 M.

Latest Posts

How E2B became essential to 88% of Fortune 100 companies and raised $21 million

July 28, 2025

The first look: Disrupt 2025 AI Stage revealed

July 28, 2025

Rise of AI deepfakes and fraudulent candidates are changing how TA recruits

July 28, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • How E2B became essential to 88% of Fortune 100 companies and raised $21 million
  • The first look: Disrupt 2025 AI Stage revealed
  • Rise of AI deepfakes and fraudulent candidates are changing how TA recruits
  • Tencent’s Hunyuan Team Releases Open-Source Hunyuan3D World Model 1.0, Can Generate Explorable 3D Worlds
  • Icertis Partners With Dioptra – 3rd AI Deal in 18 Months – Artificial Lawyer

Recent Comments

  1. binance推薦獎金 on [2407.11104] Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
  2. психолог онлайн индивидуально on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. GeraldDes on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. binance sign up on Inclusion Strategies in Workplace | Recruiting News Network
  5. Rejestracja on Online Education – How I Make My Videos

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.