Recon-Act: A Self-Evolving Multi-Agent Browser-Use System Via Web Reconnaissance, Tool Generation, And Task Execution - Takara TLDR

Recent years, multimodal models have made remarkable strides and pave the way
for intelligent browser use agents. However, when solving tasks on real world
webpages in multi-turn, long-horizon trajectories, current agents still suffer
from disordered action sequencing and excessive trial and error during
execution. This paper introduces Recon-Act, a self-evolving multi-agent
framework grounded in Reconnaissance-Action behavioral paradigm. The system
comprises a Reconnaissance Team and an Action Team: the former conducts
comparative analysis and tool generation, while the latter handles intent
decomposition, tool orchestration, and execution. By contrasting the erroneous
trajectories with successful ones, the Reconnaissance Team infers remedies, and
abstracts them into a unified notion of generalized tools, either expressed as
hints or as rule-based codes, and register to the tool archive in real time.
The Action Team reinference the process empowered with these targeting tools,
thus establishing a closed-loop training pipeline of
data-tools-action-feedback. Following the 6 level implementation roadmap
proposed in this work, we have currently reached Level 3 (with limited
human-in-the-loop intervention). Leveraging generalized tools obtained through
reconnaissance, Recon-Act substantially improves adaptability to unseen
websites and solvability on long-horizon tasks, and achieves state-of-the-art
performance on the challenging VisualWebArena dataset.

Source link

What's Hot

Tesla pleads with Trump White House not to bail on crucial climate standards

Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution – Takara TLDR

Why LA Comic Con thought making an AI-powered Stan Lee hologram was a good idea

Recon-Act: A Self-Evolving Multi-Agent Browser-Use System via Web Reconnaissance, Tool Generation, and Task Execution – Takara TLDR

BESPOKE: Benchmark for Search-Augmented Large Language Model Personalization via Diagnostic Feedback – Takara TLDR

MOSS-ChatV: Reinforcement Learning with Process Reasoning Reward for Video Temporal Reasoning – Takara TLDR

CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling – Takara TLDR

Judge Rejects Ronald Perelman’s $400 M. Art Insurance Claim

Drag Queen Alexis Stone Became the Mona Lisa for Milan Fashion Show

Steve McQueen’s Granddaughter Lawsuit for $68 M. Pollock Painting

Lisa Phillips, Longtime Director of New York’s New Museum, to Retire

Tesla pleads with Trump White House not to bail on crucial climate standards