Paper Page - AgentTTS: Large Language Model Agent For Test-time Compute-optimal Scaling Strategy In Complex Tasks

AgentTTS, an LLM-agent-based framework, optimizes compute allocation for multi-stage complex tasks, improving performance and robustness compared to traditional methods.

Test-time scaling (TTS) enhances the performance of large language models
(LLMs) by allocating additional compute resources during inference. However,
existing research primarily investigates TTS in single-stage tasks; while many
real-world problems are multi-stage complex tasks, composed of a sequence of
heterogeneous subtasks with each subtask requires LLM of specific capability.
Therefore, we study a novel problem: the test-time compute-optimal scaling in
multi-stage complex tasks, aiming to select suitable models and allocate
budgets per subtask to maximize overall performance. TTS in multi-stage tasks
introduces two fundamental challenges: (i) The combinatorial search space of
model and budget allocations, combined with the high cost of inference, makes
brute-force search impractical. (ii) The optimal model and budget allocations
across subtasks are interdependent, increasing the complexity of the
compute-optimal search. To address this gap, we conduct extensive pilot
experiments on four tasks across six datasets, deriving three empirical
insights characterizing the behavior of LLMs in multi-stage complex tasks.
Informed by these insights, we propose AgentTTS, an LLM-agent-based framework
that autonomously searches for compute-optimal allocations through iterative
feedback-driven interactions with the execution environment. Experimental
results demonstrate that AgentTTS significantly outperforms traditional and
other LLM-based baselines in search efficiency, and shows improved robustness
to varying training set sizes and enhanced interpretability.

Source link

What's Hot

Is vibe coding ruining a generation of engineers?

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions – Takara TLDR

AI Systems Can Be Fooled by Fake Dates, Giving Newer Content Unfair Visibility

Paper page – AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions – Takara TLDR

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety – Takara TLDR

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation – Takara TLDR

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Museums Prepare to Close Their Doors as Government Shutdown Continues

Is vibe coding ruining a generation of engineers?

LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions – Takara TLDR

AI Systems Can Be Fooled by Fake Dates, Giving Newer Content Unfair Visibility

What's Hot

Paper page – AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks

Related Posts

Subscribe to Updates