Paper Page - Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic And Less Risky

DiaFORGE is a disambiguation framework that enhances large language models’ ability to invoke enterprise APIs accurately through dialogue synthesis, supervised fine-tuning, and real-world evaluation.

Large language models (LLMs) are increasingly tasked with invoking enterprise
APIs, yet they routinely falter when near-duplicate tools vie for the same user
intent or when required arguments are left underspecified. We introduce
DiaFORGE (Dialogue Framework for Organic Response Generation & Evaluation), a
disambiguation-centric, three-stage pipeline that (i) synthesizes
persona-driven, multi-turn dialogues in which the assistant must distinguish
among highly similar tools, (ii) performs supervised fine-tuning of open-source
models with reasoning traces across 3B – 70B parameters, and (iii) evaluates
real-world readiness via a dynamic suite that redeploys each model in a live
agentic loop and reports end-to-end goal completion alongside conventional
static metrics. On our dynamic benchmark DiaBENCH, models trained with DiaFORGE
raise tool-invocation success by 27 pp over GPT-4o and by 49 pp over
Claude-3.5-Sonnet, both under optimized prompting. To spur further research, we
release an open corpus of 5000 production-grade enterprise API specifications
paired with rigorously validated, disambiguation-focused dialogues, offering a
practical blueprint for building reliable, enterprise-ready tool-calling
agents.

Source link

What's Hot

Google’s Nano Banana AI image editor is coming to search, Photos, and NotebookLM

AutoPR: Let’s Automate Your Academic Promotion! – Takara TLDR

Kitsa transforms clinical trial site selection with Amazon Quick Automate

Paper page – Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

AutoPR: Let’s Automate Your Academic Promotion! – Takara TLDR

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control – Takara TLDR

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents – Takara TLDR

Artist Behind Canterbury Cathedral Art Responds to JD Vance, Elon Musk

Jenkins Johnson Gallery to Open Tribeca Outpost on Marian Goodman Gallery’s Third Floor

Ruth Asawa May Have Broken Record at MoMA—and More Art News

Toledo Museum of Art Director on Digital Art, AI, and Future-Proofing

Google’s Nano Banana AI image editor is coming to search, Photos, and NotebookLM

AutoPR: Let’s Automate Your Academic Promotion! – Takara TLDR

Kitsa transforms clinical trial site selection with Amazon Quick Automate

What's Hot

Paper page – Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Related Posts

Subscribe to Updates