Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Former OpenAI board member questions Mark Zuckerberg AI hiring spree – East Bay Times

Luminance Launches AI-Driven Compliance Module – Artificial Lawyer

Paper page – Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
Facebook X (Twitter) Instagram
Advanced AI News
VentureBeat AI

New 1.5B router model achieves 93% accuracy without costly retraining

By Advanced AI EditorJuly 7, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Researchers at Katanemo Labs have introduced Arch-Router, a new routing model and framework designed to intelligently map user queries to the most suitable large language model (LLM). 

For enterprises building products that rely on multiple LLMs, Arch-Router aims to solve a key challenge: how to direct queries to the best model for the job without relying on rigid logic or costly retraining every time something changes.

The challenges of LLM routing

As the number of LLMs grows, developers are moving from single-model setups to multi-model systems that use the unique strengths of each model for specific tasks (e.g., code generation, text summarization, or image editing). 

LLM routing has emerged as a key technique for building and deploying these systems, acting as a traffic controller that directs each user query to the most appropriate model.

Existing routing methods generally fall into two categories: “task-based routing,” where queries are routed based on predefined tasks, and “performance-based routing,” which seeks an optimal balance between cost and performance.

However, task-based routing struggles with unclear or shifting user intentions, particularly in multi-turn conversations. Performance-based routing, on the other hand, rigidly prioritizes benchmark scores, often neglects real-world user preferences and adapts poorly to new models unless it undergoes costly fine-tuning.

More fundamentally, as the Katanemo Labs researchers note in their paper, “existing routing approaches have limitations in real-world use. They typically optimize for benchmark performance while neglecting human preferences driven by subjective evaluation criteria.” 

The researchers highlight the need for routing systems that “align with subjective human preferences, offer more transparency, and remain easily adaptable as models and use cases evolve.”

A new framework for preference-aligned routing

To address these limitations, the researchers propose a “preference-aligned routing” framework that matches queries to routing policies based on user-defined preferences.

In this framework, users define their routing policies in natural language using a “Domain-Action Taxonomy.” This is a two-level hierarchy that reflects how people naturally describe tasks, starting with a general topic (the Domain, such as “legal” or “finance”) and narrowing to a specific task (the Action, such as “summarization” or “code generation”). 

Each of these policies is then linked to a preferred model, allowing developers to make routing decisions based on real-world needs rather than just benchmark scores. As the paper states, “This taxonomy serves as a mental model to help users define clear and structured routing policies.”

The routing process happens in two stages. First, a preference-aligned router model takes the user query and the full set of policies and selects the most appropriate policy. Second, a mapping function connects that selected policy to its designated LLM. 

Because the model selection logic is separated from the policy, models can be added, removed, or swapped simply by editing the routing policies, without any need to retrain or modify the router itself. This decoupling provides the flexibility required for practical deployments, where models and use cases are constantly evolving.

Preference-aligned routing framework (source: arXiv)
Preference-aligned routing framework Source: arXiv

The policy selection is powered by Arch-Router, a compact 1.5B parameter language model fine-tuned for preference-aligned routing. Arch-Router receives the user query and the complete set of policy descriptions within its prompt. It then generates the identifier of the best-matching policy. 

Since the policies are part of the input, the system can adapt to new or modified routes at inference time through in-context learning and without retraining. This generative approach allows Arch-Router to use its pre-trained knowledge to understand the semantics of both the query and the policies, and to process the entire conversation history at once.

A common concern with including extensive policies in a prompt is the potential for increased latency. However, the researchers designed Arch-Router to be highly efficient. “While the length of routing policies can get long, we can easily increase the context window of Arch-Router with minimal impact on latency,” explains Salman Paracha, co-author of the paper and Founder/CEO of Katanemo Labs. He notes that latency is primarily driven by the length of the output, and for Arch-Router, the output is simply the short name of a routing policy, like “image_editing” or “document_creation.”

Arch-Router in action

To build Arch-Router, the researchers fine-tuned a 1.5B parameter version of the Qwen 2.5 model on a curated dataset of 43,000 examples. They then tested its performance against state-of-the-art proprietary models from OpenAI, Anthropic and Google on four public datasets designed to evaluate conversational AI systems.

The results show that Arch-Router achieves the highest overall routing score of 93.17%, surpassing all other models, including top proprietary ones, by an average of 7.71%. The model’s advantage grew with longer conversations, demonstrating its strong ability to track context over multiple turns. 

Arch-Router vs other models (source: arXiv)
Arch-Router vs other models Source: arXiv

In practice, this approach is already being applied in several scenarios, according to Paracha. For example, in open-source coding tools, developers use Arch-Router to direct different stages of their workflow, such as “code design,” “code understanding,” and “code generation,” to the LLMs best suited for each task. Similarly, enterprises can route document creation requests to a model like Claude 3.7 Sonnet while sending image editing tasks to Gemini 2.5 Pro. 

The system is also ideal “for personal assistants in various domains, where users have a diversity of tasks from text summarization to factoid queries,” Paracha said, adding that “in those cases, Arch-Router can help developers unify and improve the overall user experience.”

This framework is integrated with Arch, Katanemo Labs’ AI-native proxy server for agents, which allows developers to implement sophisticated traffic-shaping rules. For instance, when integrating a new LLM, a team can send a small portion of traffic for a specific routing policy to the new model, verify its performance with internal metrics, and then fully transition traffic with confidence. The company is also working to integrate its tools with evaluation platforms to streamline this process for enterprise developers further.

Ultimately, the goal is to move beyond siloed AI implementations. “Arch-Router—and Arch more broadly—helps developers and enterprises move from fragmented LLM implementations to a unified, policy-driven system,” says Paracha. “In scenarios where user tasks are diverse, our framework helps turn that task and LLM fragmentation into a unified experience, making the final product feel seamless to the end user.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleCursor apologizes for unclear pricing changes that upset users
Next Article Maximize IBM Power11 with IBM Technology Lifecycle Services
Advanced AI Editor
  • Website

Related Posts

Facing AI-powered threats, CISOs consolidate around single-vendor SASE

July 8, 2025

Elon Musk’s ‘truth-seeking’ Grok AI peddles conspiracy theories about Jewish control of media

July 7, 2025

How Capital One built production multi-agent AI workflows to power enterprise use cases

July 7, 2025
Leave A Reply

Latest Posts

Prospect New Orleans Will Not Mount Next Edition in 2027

Rising Painter Dies at 43

Confederate Group Sues Stone Mountain Over Show on Racism and Slavery

UK MPs to Debate Banning Advertising by Oil Companies

Latest Posts

Former OpenAI board member questions Mark Zuckerberg AI hiring spree – East Bay Times

July 8, 2025

Luminance Launches AI-Driven Compliance Module – Artificial Lawyer

July 8, 2025

Paper page – Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing

July 8, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Former OpenAI board member questions Mark Zuckerberg AI hiring spree – East Bay Times
  • Luminance Launches AI-Driven Compliance Module – Artificial Lawyer
  • Paper page – Beyond Simple Edits: X-Planner for Complex Instruction-Based Image Editing
  • Effective cross-lingual LLM evaluation with Amazon Bedrock
  • How this city, home to DeepSeek, became China’s AI hub

Recent Comments

No comments to show.

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.