Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Zach + Richard’s Excellent Legal AI Adventure – Artificial Lawyer

Leveraging Large Language Models for Predictive Analysis of Human Misery – Takara TLDR

Google is adding “Projects” feature to Gemini to run research tasks

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Industry Applications

GPT-5 Tops Harvey’s BigLaw Bench Eval – Artificial Lawyer

By Advanced AI EditorAugust 8, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



As AL shared last night, Harvey – and other companies – have had early access to GPT-5. The genAI pioneer has analysed the new LLM’s outputs and marked it as the best-performing OpenAI model using its ‘BigLaw Bench’ AI evaluation system. It scored 89.22% overall.

The company launched BigLaw Bench (see AL article) last year to help with gauging the quality of genAI responses, in particular relative to how a lawyer would expect an acceptable response to read.

As they explained at the time – ‘Each task in BigLaw Bench is assessed using custom-designed rubrics that measure:

Answer Quality: Evaluates the completeness, accuracy, and appropriateness of the model’s response based on specific criteria essential for effective task completion.

Source Reliability: Assesses the model’s ability to provide verifiable and correctly cited sources for its assertions, enhancing trust and facilitating validation.

Scores are calculated by combining positive points for meeting task requirements and negative points for errors or missteps (e.g. hallucinations).

Those scores are then expressed as percentages.’

And below is the chart they have provided. As you can see GPT-5 scored 89.22%, a notable improvement of around 5% on the next closest results shown, which were of another OpenAI model, o3, which was at 84.13%. (Note: Harvey uses other companies’ models, not just OpenAI, but those are not shown here.)

Harvey data, August 2025.

Moreover, this is really starting to get close to ‘last mile’ territory.

I.e. the closer we get to something where lawyers can go ‘yep, that’s fine, let it through’, the harder and harder it gets.

Getting to ‘it’s kind of right, but needs some work to get to the level I want’ is relatively easy for many LLMs. But, getting up to 90% and then into that massive last mile on the journey to 99%, is a totally different experience.

But, we are moving in the right direction. Plus, these outputs will get improved as Harvey – (and other legal tech companies) – applies refinement, system prompting, and orchestration with related data.

Which raises the question: can we ever get to 99.9% on BigLaw Bench? Probably not for some years yet, but eventually…? Why not. It goes back to the Waymo analogy this site has used a few times now: getting to the level of success where people just go with it is incredibly hard to do in a super-complex, unstructured environment, but, as Waymo showed, it can be done with enough time and investment.

Will new genAI models get much better? It’s hard to say. There will be incremental improvements for sure. But, bigger steps may come from other strategies, such as improving the verification layer.

Either way, we are making progress, and at an incredible pace. In three years we have gone from scepticism about AI, to now a majority of large law firms engaging deeply with the technology – so too their clients. And central to this change is the performance of the models. If those LLMs didn’t deliver, then the lawyers would not be so enthusiastic about the current wave of legal AI tools.

—

Right, what else?

In Harvey’s blog post on the new model, they also added some details about their own plans on how to leverage GPT-5:

‘Integrated into Harvey’s systems, these baseline capabilities can be leveraged to enable more powerful use cases in the document drafting and complex research domains. GPT-5 is also the first orchestration model that appears capable of combining these tasks—allowing for a single agent to both collaborate with a user on the research and produce the finished work product.

For example, on a task like: ‘Identify if any of these internal guidance documents are inconsistent with current regulation, we operate in the United States and the European Union’ . . . GPT-5 can be used to orchestrate agents that:

Review the internal documents to identify relevant trends to search for;

Find recent changes in global regulation;

Perform a comprehensive review of any gaps between the two; and

Draft a memo of recommendations of how to best update your internal guidance to stay aligned with the new regulatory environment.

All while prompting the user as needed for additional context to ensure it reaches the goal as expected.

Coupled with our recently-announced data partnerships with LexisNexis and iManage, Harvey is now able to see the full picture – public and proprietary – before it acts. With GPT-5’s substantially improved tool-use and drafting capabilities, we can now build a deeply integrated AI system that reasons over an organization’s internal data and leverages trusted third-party content in real-time.

Building an Intelligent Coworker

Complex matters don’t unfold linearly; they advance dynamically through iteration, and in close collaboration with internal and external stakeholders. With GPT-5, and our product and data ingredients in place, Harvey’s north star of creating an intelligent coworker comes into focus.’

—

You can find more about Harvey and read the original post here. Thanks to CEO Winston Weinberg and team for sharing.

—

Legal Innovators Conferences in New York and London – Both In November ’25

If you’d like to stay ahead of the legal AI curve….then come along to Legal Innovators New York, Nov 19 + 20, where the brightest minds will be sharing their insights on where we are now and where we are heading. 

And also, Legal Innovators UK – Nov 4 + 5 + 6

Both events, as always, are organised by the awesome Cosmonauts team! 

Please get in contact with them if you’d like to take part.

Discover more from Artificial Lawyer

Subscribe to get the latest posts sent to your email.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticlePaper page – StrandDesigner: Towards Practical Strand Generation with Sketch Guidance
Next Article iOS 26 beta gives Apple Maps an AI upgrade – and it might be enough to tempt you from Google Maps
Advanced AI Editor
  • Website

Related Posts

Zach + Richard’s Excellent Legal AI Adventure – Artificial Lawyer

August 20, 2025

Google and NASA Pilot an AI Medical Assistant for Deep Space

August 19, 2025

Tesla Model Y L addresses one huge complaint from many owners

August 19, 2025

Comments are closed.

Latest Posts

Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised

After 12-Year Hiatus, Egypt’s Alexandria Biennale Will Return

Ai Weiwei Visits Ukraine’s Front Line Ahead of Kyiv Installation

Maren Hassinger to Receive Her Largest Retrospective to Date Next Year

Latest Posts

Zach + Richard’s Excellent Legal AI Adventure – Artificial Lawyer

August 20, 2025

Leveraging Large Language Models for Predictive Analysis of Human Misery – Takara TLDR

August 20, 2025

Google is adding “Projects” feature to Gemini to run research tasks

August 20, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Zach + Richard’s Excellent Legal AI Adventure – Artificial Lawyer
  • Leveraging Large Language Models for Predictive Analysis of Human Misery – Takara TLDR
  • Google is adding “Projects” feature to Gemini to run research tasks
  • IBM bags Vi project to launch AI Innovation Hub, modernise ops
  • DeepSeek-R1: Hype cools as India seeks practical GenAI solutions

Recent Comments

  1. SamuelCoatt on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Charlesdip on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. SamuelCoatt on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.