Dyna-Mind: Learning To Simulate From Experience For Better AI Agents - Takara TLDR

Reasoning models have recently shown remarkable progress in domains such as
math and coding. However, their expert-level abilities in math and coding
contrast sharply with their performance in long-horizon, interactive tasks such
as web navigation and computer/phone-use. Inspired by literature on human
cognition, we argue that current AI agents need ”vicarious trial and error” –
the capacity to mentally simulate alternative futures before acting – in order
to enhance their understanding and performance in complex interactive
environments. We introduce Dyna-Mind, a two-stage training framework that
explicitly teaches (V)LM agents to integrate such simulation into their
reasoning. In stage 1, we introduce Reasoning with Simulations (ReSim), which
trains the agent to generate structured reasoning traces from expanded search
trees built from real experience gathered through environment interactions.
ReSim thus grounds the agent’s reasoning in faithful world dynamics and equips
it with the ability to anticipate future states in its reasoning. In stage 2,
we propose Dyna-GRPO, an online reinforcement learning method to further
strengthen the agent’s simulation and decision-making ability by using both
outcome rewards and intermediate states as feedback from real rollouts.
Experiments on two synthetic benchmarks (Sokoban and ALFWorld) and one
realistic benchmark (AndroidWorld) demonstrate that (1) ReSim effectively
infuses simulation ability into AI agents, and (2) Dyna-GRPO leverages outcome
and interaction-level signals to learn better policies for long-horizon,
planning-intensive tasks. Together, these results highlight the central role of
simulation in enabling AI agents to reason, plan, and act more effectively in
the ever more challenging environments.

Source link

What's Hot

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control – Takara TLDR

US, China leaders will avoid ‘race to the bottom’ on trade, Alibaba’s Joe Tsai says

MIT rejects Trump funding compact, ignites academic freedom showdown

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents – Takara TLDR

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control – Takara TLDR

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models – Takara TLDR

SpaceVista: All-Scale Visual Spatial Reasoning from mm to km – Takara TLDR

Artist Behind Canterbury Cathedral Art Responds to JD Vance, Elon Musk

Jenkins Johnson Gallery to Open Tribeca Outpost on Marian Goodman Gallery’s Third Floor

Toledo Museum of Art Director on Digital Art, AI, and Future-Proofing

Smithsonian Closes Museums Amid Government Shutdown

TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control – Takara TLDR

US, China leaders will avoid ‘race to the bottom’ on trade, Alibaba’s Joe Tsai says

MIT rejects Trump funding compact, ignites academic freedom showdown

What's Hot

Dyna-Mind: Learning to Simulate from Experience for Better AI Agents – Takara TLDR

Related Posts

Subscribe to Updates