Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Perplexity AI Compares XRP to Top Altcoins

Cohere, Ottawa sign non-binding agreement on government AI uses

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

GEPA optimizes LLMs without costly reinforcement learning

By Advanced AI EditorAugust 19, 2025No Comments8 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Researchers from the University of California, Berkeley, Stanford University and Databricks have introduced a new AI optimization method called GEPA that significantly outperforms traditional reinforcement learning (RL) techniques for adapting large language models (LLMs) to specialized tasks.

GEPA removes the popular paradigm of learning through thousands of trial-and-error attempts guided by simple numerical scores. Instead, it uses an LLM’s own language understanding to reflect on its performance, diagnose errors, and iteratively evolve its instructions. In addition to being more accurate than established techniques, GEPA is significantly more efficient, achieving superior results with up to 35 times fewer trial runs.

For businesses building complex AI agents and workflows, this translates directly into faster development cycles, substantially lower computational costs, and more performant, reliable applications.

The high cost of optimizing modern AI systems

Modern enterprise AI applications are rarely a single call to an LLM. They are often “compound AI systems,” complex workflows that chain multiple LLM modules, external tools such as databases or code interpreters, and custom logic to perform sophisticated tasks, including multi-step research and data analysis.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

A popular way to optimize these systems is through reinforcement learning methods, such as Group Relative Policy Optimization (GRPO), a technique employed in popular reasoning models, including DeepSeek-R1. This method treats the system as a black box; it runs a task, gets a simple success metric (a “scalar reward,” like a score of 7/10), and uses this feedback to slowly nudge the model’s parameters in the right direction.

The major drawback of RL is its sample inefficiency. To learn effectively from these sparse numerical scores, RL methods often require tens of thousands, or even hundreds of thousands, of trial runs, known as “rollouts.” For any real-world enterprise application that involves expensive tool calls (e.g., API queries, code compilation) or uses powerful proprietary models, this process is prohibitively slow and costly.

As Lakshya A Agrawal, co-author of the paper and doctoral student at UC Berkeley, told VentureBeat, this complexity is a major barrier for many companies. “For many teams, RL is not practical due to its cost and complexity—and their go-to approach so far would often just be prompt engineering by hand,” Agrawal said. He noted that GEPA is designed for teams that need to optimize systems built on top-tier models that often can’t be fine-tuned, allowing them to improve performance without managing custom GPU clusters.

The researchers frame this challenge as follows: “How can we extract maximal learning signal from every expensive rollout to enable effective adaptation of complex, modular AI systems in low-data or budget-constrained settings?”

An optimizer that learns with language

GEPA framework Source: arXiv

GEPA (Genetic-Pareto) is a prompt optimizer that tackles this challenge by replacing sparse rewards with rich, natural language feedback. It leverages the fact that the entire execution of an AI system (including its reasoning steps, tool calls, and even error messages) can be serialized into text that an LLM can read and understand. GEPA’s methodology is built on three core pillars.

First is “genetic prompt evolution,” where GEPA treats a population of prompts like a gene pool. It iteratively “mutates” prompts to create new, potentially better versions. This mutation is an intelligent process driven by the second pillar: “reflection with natural language feedback.” After a few rollouts, GEPA provides an LLM with the full execution trace (what the system tried to do) and the outcome (what went right or wrong). The LLM then “reflects” on this feedback in natural language to diagnose the problem and write an improved, more detailed prompt. For instance, instead of just seeing a low score on a code generation task, it might analyze a compiler error and conclude the prompt needs to specify a particular library version.

The third pillar is “Pareto-based selection,” which ensures smart exploration. Instead of focusing only on the single best-performing prompt, which can lead to getting stuck in a suboptimal solution (a “local optimum”), GEPA maintains a diverse roster of “specialist” prompts. It tracks which prompts perform best on different individual examples, creating a list of top candidates. By sampling from this diverse set of winning strategies, GEPA ensures it explores more solutions and is more likely to discover a prompt that generalizes well across a wide range of inputs.

Selecting a single best candidate (left) can result in models getting stuck in local minima while Pareto selection (right) can explore more options and find optimal solutions Source: arXiv

The effectiveness of this entire process hinges on what the researchers call “feedback engineering.” Agrawal explains that the key is to surface the rich, textual details that systems already produce but often discard. “Traditional pipelines often reduce this detail to a single numerical reward, obscuring why particular outcomes occur,” he said. “GEPA’s core guidance is to structure feedback that surfaces not only outcomes but also intermediate trajectories and errors in plain text—the same evidence a human would use to diagnose system behavior.”

For example, for a document retrieval system, this means listing which documents were retrieved correctly and which were missed, rather than just calculating a final score.

GEPA in action

The researchers evaluated GEPA across four diverse tasks, including multi-hop question answering (HotpotQA) and privacy-preserving queries (PUPA). They used both open-source (Qwen3 8B) and proprietary (GPT-4.1 mini) models, comparing GEPA against the RL-based GRPO and the state-of-the-art prompt optimizer MIPROv2.

Across all tasks, GEPA substantially outperformed GRPO, achieving up to a 19% higher score while using up to 35 times fewer rollouts. Agrawal provided a concrete example of this efficiency gain: “We used GEPA to optimize a QA system in ~3 hours versus GRPO’s 24 hours—an 8x reduction in development time, while also achieving 20% higher performance,” he explained. “RL-based optimization of the same scenario in our test cost about $300 in GPU time, while GEPA cost less than $20 for better results—15x savings in our experiments.”

GEPA outperforms other baselines on key benchmarks Source: arXiv

Beyond raw performance, the researchers found that GEPA-optimized systems are more reliable when faced with new, unseen data. This is measured by the “generalization gap” (the difference between performance on training data and final test data). Agrawal hypothesizes that this is because GEPA learns from richer feedback. “GEPA’s smaller generalization gap may stem from its use of rich natural-language feedback on each outcome—what worked, what failed, and why—rather than relying solely on a single scalar reward,” he said. “This may encourage the system to develop instructions and strategies grounded in a broader understanding of success, instead of merely learning patterns specific to the training data.” For enterprises, this improved reliability means less brittle, more adaptable AI applications in customer-facing roles.

A major practical benefit is that GEPA’s instruction-based prompts are up to 9.2 times shorter than prompts produced by optimizers like MIPROv2, which include many few-shot examples. Shorter prompts decrease latency and reduce costs for API-based models. This makes the final application faster and cheaper to run in production.

The paper also presents promising results for utilizing GEPA as an “inference-time” search strategy, transforming the AI from a single-answer generator into an iterative problem solver. Agrawal described a scenario where GEPA could be integrated into a company’s CI/CD pipeline. When new code is committed, GEPA could automatically generate and refine multiple optimized versions, test them for performance, and open a pull request with the best-performing variant for engineers to review. “This turns optimization into a continuous, automated process—rapidly generating solutions that often match or surpass expert hand-tuning,” Agrawal noted. In their experiments on CUDA code generation, this approach boosted performance on 20% of tasks to an expert level, compared to 0% for a single-shot attempt from GPT-4o.

The paper’s authors believe GEPA is a foundational step toward a new paradigm of AI development. But beyond creating more human-like AI, its most immediate impact may be in who gets to build high-performing systems.

“We expect GEPA to enable a positive shift in AI system building—making the optimization of such systems approachable by end-users, who often have the domain expertise relevant to the task, but not necessarily the time and willingness to learn complex RL specifics,” Agrawal said. “It gives power directly to the stakeholders with the exact task-specific domain knowledge.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleThis Is Why Jim Cramer Is Bullish on IBM Stock
Next Article 95% of generative AI pilots at companies are failing
Advanced AI Editor
  • Website

Related Posts

DeepSeek V3.1 just dropped — and it might be the most powerful open AI yet

August 19, 2025

VB AI Impact Series: Can you really govern multi-agent AI?

August 19, 2025

Keychain launches AI operating system for CPG manufacturers

August 19, 2025

Comments are closed.

Latest Posts

Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised

After 12-Year Hiatus, Egypt’s Alexandria Biennale Will Return

Senator Seeks Investigation into Jeffrey Epstein’s Work for Leon Black

Spike Lee’s ‘Highest 2 Lowest’ Features Art From His Own Collection

Latest Posts

Perplexity AI Compares XRP to Top Altcoins

August 19, 2025

Cohere, Ottawa sign non-binding agreement on government AI uses

August 19, 2025

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning – Takara TLDR

August 19, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Perplexity AI Compares XRP to Top Altcoins
  • Cohere, Ottawa sign non-binding agreement on government AI uses
  • ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning – Takara TLDR
  • How Infosys built a generative AI solution to process oil and gas drilling data with Amazon Bedrock
  • MIT Report Finds Most AI Business Investments Fail, Reveals ‘GenAI Divide’ — Virtualization Review

Recent Comments

  1. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Charliecep on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. nekretnine-zabljak-placevi-753 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. ScottLag on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.