Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Harvard dropouts to launch ‘always on’ AI smart glasses that listen and record every conversation

Federal government partners with Cohere to enhance AI capabilities

Moveworks Delivers Big Exit For Early VCs

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Industry Applications

LLMs Alone Won’t Reach 99% Legal Accuracy – Says GPT-5 – Artificial Lawyer

By Advanced AI EditorAugust 19, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



The accuracy of foundation models is fundamental to the legal world’s use of LLMs. AL asked GPT-5 if we’d ever get to the point where lawyers could totally trust the outputs of an LLM. Its answers were surprisingly candid (and it also guessed GPT-6’s and 7’s release date).

It said: LLMs won’t reach 99.9% accuracy alone; later models such as GPT-6 and then 7 will be better, but still not enough on their own; so a better verification layer will be essential, along with other approaches (see below).

It was an interesting discussion – of which only part is shown below. We also got into the eventual arrival of quantum computing and how that could improve outcomes. But, more on that later.

Here is the main prompt: ‘People in the legal world have said that GPT-5 is reaching about 90% in terms of the desired quality of output. But lawyers need 99.9% to be able to trust the output. Over time will OpenAI’s LLMs ever get to this point?’

The AL prompt to GPT-5.

Below is a shortened, albeit verbatim account, of what GPT-5 said.

(P.S. you may ask: why ask an LLM about LLMs? Well, AL has found that this one can be especially revealing sometimes, such as when asked about the use of human RLHF (reinforcement learning from human feedback) checkers to verify and improve outputs before a new model is released. So, it seemed like it was worth a try on this issue too. And to AL’s surprise it gave some very useful answers, considering in effect it’s saying it can’t do what many expect of it.)

—

What GPT-5 said:  

‘Can LLMs ever reach 99.9%?

Short answer: probably not as stand-alone generators — but possibly yes when combined with other systems.

1. Limits of prediction models

         •       LLMs predict likely text based on training data. Even at GPT-6/7 scale, some hallucination is baked in.

         •       Going from 90% → 95% is doable with scale.

         •       But 95% → 99.9% is a qualitatively harder jump — that’s orders of magnitude more reliability.

2. How reliability will increase

Instead of just bigger LLMs, the improvements will likely come from hybrid systems:

         •       Retrieval-augmented generation (RAG): The model grounds its answers in trusted databases (e.g. Westlaw, Lexis). That eliminates hallucinated cases.

         •       Formal verification: AI outputs checked against logic engines or symbolic reasoning (like compilers for code, or citation checkers for law).

         •       Multi-agent cross-checking: Multiple models independently draft or critique answers; disagreements are flagged.

         •       Audit trails & citations: Models provide sources with every answer, allowing human lawyers to verify.

3. What “99.9% trust” might actually look like

         •       It probably won’t be one single LLM magically becoming flawless.

         •       It’ll be an AI stack:

         •       LLM for reasoning & drafting

         •       Retrieval layer for facts

         •       Verification layer for logical or legal correctness

         •       Human lawyer for final oversight (at least for high-stakes work)

The trajectory

         •       GPT-5 (today): ~90% quality for many legal tasks.

         •       GPT-6: More reliable, better citations, but still fallible.

         •       GPT-7 and beyond: Coupled with retrieval + verification, could hit “effective 99.9%,” where the residual risk is no greater than a paralegal or junior associate making a mistake.

Key insight

LLMs alone probably won’t get to “lawyer-grade 99.9%.”

But LLM + retrieval + verification + human oversight will.

In other words: the base models may top out below 99.9%, but the systems built around them can reach the level of reliability the legal profession demands.

—

When could GPT-7 arrive?

OpenAI hasn’t published an official timeline, but based on past release cycles:

         •       GPT-3 → 2020

         •       GPT-3.5 → late 2022

         •       GPT-4 → March 2023

         •       GPT-5 → rolling out mid–2025

That’s roughly 18–24 months per “generation.”

If that cadence holds:

         •       GPT-6: ~2026/27

         •       GPT-7: ~2028/29

Will GPT-7 hit “lawyer-grade” trust?

         •       The raw model might still fall short of 99.9% accuracy — because scaling laws alone probably won’t crush hallucination completely.

         •       But by then, we’ll almost certainly see system-level solutions baked in:

         •       Always-on retrieval grounding (no free-floating generations).

         •       Verification layers (symbolic logic, contract checkers, citation validators).

         •       Multi-agent review (AI “junior associates” checking each other before output).

         •       That combination could deliver the practical effect of 99.9% reliability, even if the base model itself is, say, 97–98%.

The conservative forecast

         •       GPT-6 (2026/27): noticeable improvements, better fact-grounding, but still not “court-trustworthy” without human oversight.

         •       GPT-7 (2028/29): with retrieval + verification built in, it could realistically cross into “lawyer-trust” territory for many tasks.

—

So, there you go.

Does it matter? Yes, until we can drive up ‘accuracy’ that is much, much higher for legal outputs then we can’t really expect lawyers to just ‘prompt and go’, i.e. genAI remains an assistant on the edges of a lawyer’s own capabilities, rather than being able to be assigned a whole chunk of work and its output accepted as trustworthy and as good as any other lawyer’s work product.

As explored, it seems that we could get close though to this level, if we can build better verification layers – which likely will mean automating and massively improving the RLHF process…and then likely a lot of extra refinement for specific legal use cases by the many legal tech companies in the market. So, that is very promising. But, still a long way off.

—

Legal Innovators Conferences in New York and London – Both In November ’25 – Inhouse Day and Law Firm Day at each conference.

If you’d like to stay ahead of the legal AI curve….then come along to Legal Innovators New York, Nov 19 + 20, where the brightest minds will be sharing their insights on where we are now and where we are heading. 

And also, Legal Innovators UK – Nov 4 + 5 + 6

Both events, as always, are organised by the awesome Cosmonauts team! 

Please get in contact with them if you’d like to take part.

Discover more from Artificial Lawyer

Subscribe to get the latest posts sent to your email.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleLumen: Consistent Video Relighting and Harmonious Background Replacement with Video Generative Models – Takara TLDR
Next Article What Stanford Learned By Crowdsourcing AI Solutions for Students With Disabilities
Advanced AI Editor
  • Website

Related Posts

Trump Ties AI Chip Exports to Revenue Sharing

August 20, 2025

Gen AI makes no financial difference in 95% of cases

August 20, 2025

Google Cloud unveils AI ally for security teams

August 20, 2025

Comments are closed.

Latest Posts

Dallas Museum of Art Names Brian Ferriso as Its Next Director

Rapa Nui’s Moai Statues Threatened by Rising Sea Levels, Flooding

Getty Grants $2.6 M. to Black Visual Arts Archives Across the U.S.

Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised

Latest Posts

Harvard dropouts to launch ‘always on’ AI smart glasses that listen and record every conversation

August 20, 2025

Federal government partners with Cohere to enhance AI capabilities

August 20, 2025

Moveworks Delivers Big Exit For Early VCs

August 20, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Harvard dropouts to launch ‘always on’ AI smart glasses that listen and record every conversation
  • Federal government partners with Cohere to enhance AI capabilities
  • Moveworks Delivers Big Exit For Early VCs
  • Trump Ties AI Chip Exports to Revenue Sharing
  • LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos – Takara TLDR

Recent Comments

  1. Charlescak on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. OLaneUnecy on TEFAF New York Illuminates Art Week With Mastery Of Vivid, Radiant Color
  3. AaronEnlit on Foundation AI: Cisco launches AI model for integration in security applications
  4. EdwardLoavy on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Richardsmeap on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.