Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Finding the Right Talent: Building a Cohesive Hiring Strategy

Cornell–IBM Collaboration Advances Quantum Computing

China’s Manus AI shifts global HQ to Singapore

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Gary Marcus

Why DO large language models hallucinate?

By Advanced AI EditorMay 5, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Regular readers of the newsletter will know that I have long been negatively impressed by the routine hallucinations of LLMs, and that my favorite example has involved an alleged pet chicken named Henrietta that I allegedly own:

This, sent to me by a reader in 2023, is of course a hallucination. I don’t actually own a pet chicken, nor any pet named Henrietta. If I did own a pet chicken I rather doubt I would call it Henrietta.

I do, however, know the legendary Harry Shearer, and was lucky enough to dine with him yesterday in LA. Antecedent to our breakfast, he sent me a problematic biography of him (“vomited by AI” in the words of a professor friend of his who forwarded it to him), along with the comment “No pet chicken, but still…”:

The joke was that, as we have come to expect, the AI’s output was truth mixed with untruth, intermingled so finely that to the uninitiated it might appear as truth — especially given the encyclopedia-like tone.

And indeed some of it is true. Harry really does act and do voiceovers for a lot of the Simpson characters, and he did play the bass player in the legendary This is Spinal Tap.

But, come on, the name of the bass player in Spinal Tap was Derek Smalls, not David Stanhill (a made-up name that doesn’t correspond to anyone Harry or I know, in that film or otherwise). And Harry is an American actor, not British, born and raised in Los Angeles. And whatever X may allege, Harry assures me that he didn’t have anything to do with Jaws.

And then there are the errors of omission; Harry didn’t just play Derek Smalls, he co-wrote This is Spinal Tap, a rather important credit to omit. He wrote for and performed on Saturday Night Live, and acted in many other movies from The Truman Show to A Mighty Wind, wrote and directed a documentary about Hurricane Katrina, and so on. No mention either of lovely and talented wife Judith Owen, his radio shows or his Primetime Emmy Award or Grammy nominations, either.

The extra embarrassing part about all of this for GenAI is that almost of the above information could have been found quickly and easily with a two second search for his wikipedia page, and most of it could be found in the first screenful at that.

Three paragraphs and a box, and Google AI Overviews couldn’t get it right. Which raises the question: how could GenAI be so dumb that it could not fact check its own work against wikipedia? (And also, for another time, how could anyone think that a system so dumb is tantamount to AGI?)

§

As they occasionally say in the entertainment business, thereby lies a tale.

It is a tale of confusion: between what humans do, and what machines do.

That tale first started in 1965 with the simple AI system Eliza, which used a bunch of dopey keyword searches to fool humans into thinking it was far more intelligent than it actually was. You say “my girlfriend and I had a fight”, it matched the word “girlfriend” and spat back a phrase like “tell me more about your relationship”, and voila, some people imagined intelligence where there is nothing more than a simple party trick.

Because LLMS statistically mimic the language people have used, they often fool people into thinking that they operate like people.

But they don’t operate like people. They don’t, for example, ever fact check (as humans sometimes, when well motivated, do). They mimic the kinds of things of people say in various contexts. And that’s essentially all they do.

You can think of the whole output of an LLM as a little bit like Mad Libs.

[Human H] is a [Nationality N] [Profession P] known for [Y].

By sheer dint of crunching unthinkably large amounts of data about words co-occurring together in vast of corpora of text, sometimes that works out. Shearer and Spinal Tap co-occur in enough text that the systems gets that right. But that sort of statistical approximations lacks reliability. It is often right, but also routinely wrong. For example, some of the groups of people that Shearer belongs to, such as entertainers, actors, comedians, musicians and so forth includes many people from Britain, and so words for entertainers and the like co-occur often with words like British. To a next-token predictor, a phrase like Harry Shearer lives in a particular part of a multidimensional space. Words in that space are often followed by words like “British actor”. So out comes a hallucination.

And although I don’t own a pet chicken named Henrietta, another Gary (Oswalt) illustrated a book with Henrietta in the title. In the word schmear that is LLMs, that was perhaps enough to get an LLM to synthesize the bogus sentence with me and Henrietta.

Of course the systems are probabilistic; not every LLM will produce a hallucination every time. But the problem is not going away; OpenAI’s recent o3 actually hallucinates more than some its predecesssors.

The chronic problem with creating fake citations in research papers and faked cases in legal briefs is a manifestation of the same problem; LLMs correctly “model” the structure of academic references, but often make up titles, page numbers, journals and so on — once again failing to sanity check their outputs against information (in this case lists of references) that are readily found on the internet. So to is the rampant problem with numerical errors in financial reports, documented in a recent benchmark.

Just how bad is it? One recent study showed rates of hallucinations of between 15% and 60% across various models on a benchmark of 60 questions that were easily verifiable relative to easily found CNN source articles that were directly supplied in the exam. Even the best performance (15% hallucination rate) is, relative to an open-book exam with sources supplied, pathetic. That same study reports that, “According to Deloitte, 77% of businesses who joined the study are concerned about AI hallucinations”.

If I can be blunt, it is an absolute embarrassment that a technology that has collectively cost about half a trillion dollars can’t do something as basic as (reliably) check its output against wikipedia or a CNN article that is handed on a silver plattter. But LLMs still cannot – and on their own may never be able to — reliably do even things that basic.

LLMs don’t actually know what a nationality, or who Harry Shearer is; they know what words are and they know which words predict which other words in the context of words. They know what kinds of words cluster together in what order. And that’s pretty much it. They don’t operate like you and me. They don’t have a database of records like any proper business would (which would be a strong basis to solve the problem); and they don’t have what people like Yann LeCun or Judea Pearl or I would call a world model.

Even though they have surely digested Wikipedia, they can’t reliably stick to what is there (or justify their occasional deviations therefrom). They can’t even properly leverage the readily available database that parses wikipedia boxes into machine-readable form, which really ought to be child’s play for any genuinely intelligent system. (Those systems also can’t reliably stick to the rules of chess despite having them – and millions of games – in their database, a manifestation of the related problem of extracting statistical tendencies without every fully deriving and apprehending the correct abstractions).

LLMs have their uses – next-token prediction is great for a kind of fancy autocomplete for coding, for example — but it is a fallacy to think that because GenAI outputs sound human-like that their computations are human-like. LLMs mimic the rough structure of human language, but 8 years and roughly half a trillion dollars after their introduction, they continue to lack a grasp of a reality.

And they have such a superficial understanding of their own output that they can’t begin to fact check it.

We will eventually get to something better, but continuing to put all our eggs in the Henrietta’s LLM basket is absurd to the point of being delusional.

Dr. Gary Marcus has been warning people about the inherent problem of hallucinations in neural networks since 2001.

Bonus track: Harry Shearer contemplating LLMs, May 4, 2025, in his hometown of Los Angeles, photo by the author:



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleApparent Tesla Cybercab castings are piling up in Giga Texas
Next Article Pre-Training GPT-4.5
Advanced AI Editor
  • Website

Related Posts

Why my p(doom) has risen, dramatically

July 15, 2025

How o3 and Grok 4 Accidentally Vindicated Neurosymbolic AI

July 13, 2025

AI coding may not be helping as much as you think

July 10, 2025
Leave A Reply

Latest Posts

Justin Sun, Billionaire Banana Buyer, Buys $100 M. of Trump Memecoin

WeTransfer Changes Terms of Service After Criticism on Licensing

Artist is Turning Greyhound Bus into Museum of the Great Migration

The Artists and Art Pros Who Donated to Cuomo and Mamdani’s Campaigns

Latest Posts

Finding the Right Talent: Building a Cohesive Hiring Strategy

July 16, 2025

Cornell–IBM Collaboration Advances Quantum Computing

July 16, 2025

China’s Manus AI shifts global HQ to Singapore

July 16, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Finding the Right Talent: Building a Cohesive Hiring Strategy
  • Cornell–IBM Collaboration Advances Quantum Computing
  • China’s Manus AI shifts global HQ to Singapore
  • MiniMax, the ‘world-class’ AI start-up lauded by Jensen Huang, applies for Hong Kong IPO
  • Exclusive: Krisp launches VIVA development kit to enhance accuracy for voice AI agents

Recent Comments

  1. inscreva-se na binance on Your friend, girlfriend, therapist? What Mark Zuckerberg thinks about future of AI, Meta’s Llama AI app, more
  2. Duanepiems on Orange County Museum of Art Discusses Merger with UC Irvine
  3. binance on VAST Data Unlocks Real-Time, Multimodal AI Agent Intelligence With NVIDIA
  4. ⛏ Ticket- Operation 1,208189 BTC. Assure => https://graph.org/Payout-from-Blockchaincom-06-26?hs=53d5900f2f8db595bea7d1d205d9c375& ⛏ on Were RNNs All We Needed? (Paper Explained)
  5. 📗 + 1.333023 BTC.NEXT - https://graph.org/Payout-from-Blockchaincom-06-26?hs=ec6999251b5fd7a82cd3e6db8f19412e& 📗 on OpenAI is pushing for industry-specific AI benchmarks – why that matters

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.