Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

MIT’s new tech enables robots to act in real time, plan thousands of moves in seconds

Nebius Stock Soars on $1B AI Funding, Analyst Sees 75% Upside

Voice AI that actually converts: New TTS model boosts sales 15% for major brands

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » Designing Pareto-optimal GenAI workflows with syftr
Data Robot Blog

Designing Pareto-optimal GenAI workflows with syftr

Advanced AI BotBy Advanced AI BotMay 28, 2025No Comments10 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


You’re not short on tools. Or models. Or frameworks.

What you’re short on is a principled way to use them — at scale.

Building effective generative AI workflows, especially agentic ones, means navigating a combinatorial explosion of choices.

Every new retriever, prompt strategy, text splitter, embedding model, or synthesizing LLM multiplies the space of possible workflows, resulting in a search space with over 10²³ possible configurations. 

Trial-and-error doesn’t scale. And model-level benchmarks don’t reflect how components behave when stitched into full systems.

That’s why we built syftr — an open source framework for automatically identifying Pareto-optimal workflows across accuracy, cost, and latency constraints.

The complexity behind generative AI workflows

To illustrate how quickly complexity compounds, consider even a relatively simple RAG pipeline like the one shown in Figure 1.

Each component—retriever, prompt strategy, embedding model, text splitter, synthesizing LLM—requires careful selection and tuning. And beyond those decisions, there’s an expanding landscape of end-to-end workflow strategies, from single-agent workflows like ReAct and LATS to multi-agent workflows like CaptainAgent and Magentic-One. 

Figure 1 syftr blog post
Figure 1. Even a simple AI workflow requires selecting and testing multiple components and hyperparameters.

What’s missing is a scalable, principled way to explore this configuration space.

That’s where syftr comes in.

Its open source framework uses multi-objective Bayesian Optimization to efficiently search for Pareto-optimal RAG workflows, balancing cost, accuracy, and latency across configurations that would be impossible to test manually.

Benchmarking Pareto-optimal workflows with syftr

Once syftr is applied to a workflow configuration space, it surfaces candidate pipelines that achieve strong tradeoffs across key performance metrics.

The example below shows syftr’s output on the CRAG (Comprehensive RAG) Sports benchmark, highlighting workflows that maintain high accuracy while significantly reducing cost.

Fogire 2 syftr blog post
Figure 2. syftr searches across a large workflow configuration space to identify Pareto-optimal RAG workflows — agentic and non-agentic — that balance accuracy and cost. On the CRAG Sports benchmark, syftr identifies workflows that match the accuracy of top-performing configurations while reducing cost by nearly two orders of magnitude.

While Figure 2 shows what syftr can deliver, it’s equally important to understand how those results are achieved. 

At the core of syftr is a multi-objective search process designed to efficiently navigate vast workflow configuration spaces. The framework prioritizes both performance and computational efficiency – essential requirements for real-world experimentation at scale.

Figure 3 syftr using multi objective Bayesian Optimization
Figure 3. syftr uses multi-objective Bayesian Optimization (BO) to search across a space of approximately 10²³ unique workflows.

Since evaluating every workflow in this space isn’t feasible, we typically evaluate around 500 workflows per run.

To make this process even more efficient, syftr includes a novel early stopping mechanism — Pareto Pruner — which halts evaluation of workflows that are unlikely to improve the Pareto frontier. This significantly reduces computational cost and search time while preserving result quality. 

Why current benchmarks aren’t enough

While model benchmarks, like MMLU, LiveBench, Chatbot Arena, and the Berkeley Function-Calling Leaderboard, have advanced our understanding of isolated model capabilities, foundation models rarely operate alone in real-world production environments.

Instead, they’re typically one component — albeit an essential one — within larger, sophisticated AI systems.

Measuring intrinsic model performance is critical, but it leaves open critical system-level questions: 

How do you construct a workflow that meets task-specific goals for accuracy, latency, and cost?

Which models should you use—and in which parts of the pipeline?

syftr addresses this gap by enabling automated, multi-objective evaluation across entire workflows.

It captures nuanced tradeoffs that emerge only when components interact within a broader pipeline, and systematically explores configuration spaces that are otherwise impractical to evaluate manually.

syftr is the first open-source framework specifically designed to automatically identify Pareto-optimal generative AI workflows that balance multiple competing objectives simultaneously — not just accuracy, but latency and cost as well.

It draws inspiration from existing research, including:

AutoRAG, which focuses solely on optimizing for accuracy

Kapoor et al. ‘s work, AI Agents That Matter, which emphasizes cost-controlled evaluation to prevent incentivizing overly costly, leaderboard-focused agents. This principle serves as one of our core research inspirations. 

Importantly, syftr is also orthogonal to LLM-as-optimizer frameworks like Trace and TextGrad, and generic flow optimizers like DSPy. Such frameworks can be combined with syftr to further optimize prompts in workflows. 

In early experiments, syftr first identified Pareto-optimal workflows on the CRAG Sports benchmark.

We then applied Trace to optimize prompts across all of those configurations — taking a two-stage approach: multi-objective workflow search followed by fine-grained prompt tuning.

The result: notable accuracy improvements, especially in low-cost workflows that initially exhibited lower accuracy (those clustered in the lower-left of the Pareto frontier). These gains suggest that post-hoc prompt optimization can meaningfully boost performance, even in highly cost-constrained settings.

This two-stage approach — first multi-objective configuration search, then prompt refinement — highlights the benefits of combining syftr with specialized downstream tools, enabling modular and flexible workflow optimization strategies.

Figure 4 prompt optimization with Trace further improves Pareto optimal flows identified by syftr
Figure 4. Prompt optimization with Trace further improves Pareto-optimal flows identified by syftr. In the CRAG Sports benchmark shown here, using Trace significantly enhanced the accuracy of lower-cost workflows, shifting the Pareto frontier upward.

Building and extending syftr’s search space

Syftr cleanly separates the workflow search space from the underlying optimization algorithm. This modular design enables users to easily extend or customize the space, adding or removing flows, models, and components by editing configuration files.

The default implementation uses Multi-Objective Tree-of-Parzen-Estimators (MOTPE), but syftr supports swapping in other optimization strategies.

Contributions of new flows, modules, or algorithms are welcomed via pull request at github.com/datarobot/syftr.

Figure 5 syftr blog post
Figure 5. The current search space includes both agentic workflows (e.g., SubQuestion RAG, Critique RAG, ReAct RAG, LATS) and non-agentic RAG pipelines. Agentic workflows use non-agentic flows as subcomponents. The full space contains ~10²³ configurations.

Built on the shoulders of open source

syftr builds on a number of powerful open source libraries and frameworks:

Ray for distributing and scaling search over large clusters of CPUs and GPUs

Ray Serve for autoscaling model hosting

Optuna for its flexible define-by-run interface (similar to PyTorch’s eager execution) and support for state-of-the-art multi-objective optimization algorithms

LlamaIndex for building sophisticated agentic and non-agentic RAG workflows

HuggingFace Datasets for fast, collaborative, and uniform dataset interface

Trace for optimizing textual components within workflows, such as prompts

syftr is framework-agnostic: workflows can be constructed using any orchestration library or modeling stack. This flexibility allows users to extend or adapt syftr to fit a wide variety of tooling preferences.

Case study: syftr on CRAG Sports

Benchmark setup

The CRAG benchmark dataset was introduced by Meta for the KDD Cup 2024 and includes three tasks:

Task 1: Retrieval summarization

Task 2: Knowledge graph and web retrieval

Task 3: End-to-end RAG

syftr was evaluated on Task 3 (CRAG3), which includes 4,400 QA pairs spanning a wide range of topics. The official benchmark performs RAG over 50 webpages retrieved for each question. 

To increase difficulty, we combined all webpages across all questions into a single corpus, creating a more realistic, challenging retrieval setting.

Figure 6 pareto optimal flows discovered by syftr on CRAG Task 3
Figure 6. Pareto-optimal flows discovered by syftr on CRAG Task 3 (Sports dataset). syftr identifies workflows that are both more accurate and significantly cheaper than a default RAG pipeline built in LlamaIndex (white box). It also outperforms Amazon Q on the same task—an expected result, given that Q is built for general-purpose usage while syftr is tuned for the dataset. This highlights a key insight: custom flows can meaningfully outperform off-the-shelf solutions, especially in cost-sensitive, accuracy-critical applications.

Note: Amazon Q pricing uses a per-user/month pricing model, which differs from the per-query token-based cost estimates used for syftr workflows.

Key observations and insights

Across datasets, syftr consistently surfaces meaningful optimization patterns:

Non-agentic workflows dominate the Pareto frontier. They’re faster and cheaper, leading the optimizer to favor these configurations more frequently than agentic ones.

GPT-4o-mini frequently appears in Pareto-optimal flows, suggesting it offers a strong balance of quality and cost as a synthesizing LLM.

Reasoning models like o3-mini perform well on quantitative tasks (e.g., FinanceBench, InfiniteBench), likely due to their multi-hop reasoning capabilities.

Pareto frontiers eventually flatten after an initial rise, with diminishing returns in accuracy relative to steep cost increases, underscoring the need for tools like syftr that help pinpoint efficient operating points.

We routinely find that the workflow at the knee point of the Pareto frontier loses just a few percentage points in accuracy compared to the most accurate setup — while being 10x cheaper.

syftr makes it easy to find that sweet spot.

Cost of running syftr

In our experiments, we allocated a budget of ~500 workflow evaluations per task. Although exact costs vary based on the dataset and search space complexity, we consistently identified strong Pareto frontiers with a one-time search cost of approximately $500 per use case.

We expect this cost to decrease as more efficient search algorithms and space definitions are developed.

Importantly, this initial investment is minimal relative to the long-term gains from deploying optimized workflows, whether through reduced compute usage, improved accuracy, or better user experience in high-traffic systems.

For detailed results across six benchmark tasks, including datasets beyond CRAG, refer to the full syftr paper. 

Getting started and contributing

To get started with syftr, clone or fork the repository on GitHub. Benchmark datasets are available on HuggingFace, and syftr also supports user-defined datasets for custom experimentation.

The current search space includes:

9 proprietary LLMs

11 embedding models

4 general prompt strategies

3 retrievers

4 text splitters (with parameter configurations)

4 agentic RAG flows and 1 non-agentic RAG flow, each with associated hierarchical hyperparameters

New components, such as models, flows, or search modules, can be added or modified via configuration files. Detailed walkthroughs are available to support customization.

syftr is developed fully in the open. We welcome contributions via pull requests, feature proposals, and benchmark reports. We’re particularly interested in ideas that advance the research direction or improve the framework’s extensibility.

What’s ahead for syftr

syftr is still evolving, with several active areas of research designed to extend its capabilities and practical impact:

Meta-learning
Currently, each search is performed from scratch. We’re exploring meta-learning techniques that leverage prior runs across similar tasks to accelerate and guide future searches.

Multi-agent workflow evaluation
While multi-agent systems are gaining traction, they introduce additional complexity and cost. We’re investigating how these workflows compare to single-agent and non-agentic pipelines, and when their tradeoffs are justified.

Composability with prompt optimization frameworks
syftr is complementary to tools like DSPy, Trace, and TextGrad, which optimize textual components within workflows. We’re exploring ways to more deeply integrate these systems to jointly optimize structure and language.

More agentic tasks
We started with question-answer tasks, a critical production use case for agents. Next, we plan to rapidly expand syftr’s task repertoire to code generation, data analysis, and interpretation. We also invite the community to suggest additional tasks for syftr to prioritize.

As these efforts progress, we aim to expand syftr’s value as a research tool, a benchmarking framework, and a practical assistant for system-level generative AI design.

If you’re working in this space, we welcome your feedback, ideas, and contributions.

Try the code, read the research

To explore syftr further, check out the GitHub repository or read the full paper on ArXiv for details on methodology and results.

Syftr has been accepted to appear at the International Conference on Automated Machine Learning (AutoML) in September, 2025 in New York City.

We look forward to seeing what you build and discovering what’s next, together.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleMIT scientists developing silicon solar cell based on singlet exciton fission – pv magazine USA
Next Article IBM layoffs: After HR, is it sales, accounting, marketing, and legal? IBM sacks 8,000 HR employees, replaces them with AI
Advanced AI Bot
  • Website

Related Posts

AI platforms for secure, on-prem delivery

May 8, 2025

Forecast demand with precision using advanced AI for SAP IBP

April 30, 2025

Why your AI investments aren’t paying off

March 30, 2025
Leave A Reply Cancel Reply

Latest Posts

Collector Hoping Elon Musk Buys Napoleon Collection

How Former Apple Music Mastermind Larry Jackson Signed Mariah Carey To His $400 Million Startup

Meet These Under-25 Climate Entrepreneurs

Netflix, Martha Stewart, T.O.P And Lil Yachty Welcome You To The K-Era

Latest Posts

MIT’s new tech enables robots to act in real time, plan thousands of moves in seconds

June 6, 2025

Nebius Stock Soars on $1B AI Funding, Analyst Sees 75% Upside

June 6, 2025

Voice AI that actually converts: New TTS model boosts sales 15% for major brands

June 6, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.