Paper page - Benchmarking LLMs' Swarm intelligence

Large Language Models (LLMs) show potential for complex reasoning, yet their
capacity for emergent coordination in Multi-Agent Systems (MAS) when operating
under strict constraints-such as limited local perception and communication,
characteristic of natural swarms-remains largely unexplored, particularly
concerning the nuances of swarm intelligence. Existing benchmarks often do not
fully capture the unique challenges of decentralized coordination that arise
when agents operate with incomplete spatio-temporal information. To bridge this
gap, we introduce SwarmBench, a novel benchmark designed to systematically
evaluate the swarm intelligence capabilities of LLMs acting as decentralized
agents. SwarmBench features five foundational MAS coordination tasks within a
configurable 2D grid environment, forcing agents to rely primarily on local
sensory input (k x k view) and local communication. We propose metrics for
coordination effectiveness and analyze emergent group dynamics. Evaluating
several leading LLMs in a zero-shot setting, we find significant performance
variations across tasks, highlighting the difficulties posed by local
information constraints. While some coordination emerges, results indicate
limitations in robust planning and strategy formation under uncertainty in
these decentralized scenarios. Assessing LLMs under swarm-like conditions is
crucial for realizing their potential in future decentralized systems. We
release SwarmBench as an open, extensible toolkit-built upon a customizable and
scalable physical system with defined mechanical properties. It provides
environments, prompts, evaluation scripts, and the comprehensive experimental
datasets generated, aiming to foster reproducible research into LLM-based MAS
coordination and the theoretical underpinnings of Embodied MAS. Our code
repository is available at https://github.com/x66ccff/swarmbench.

Source link

What's Hot

[2410.06523] Phase Diagram from Nonlinear Interaction between Superconducting Order and Density: Toward Data-Based Holographic Superconductor

Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Stanford HAI’s annual report highlights rapid adoption and growing accessibility of powerful AI systems

Paper page – Benchmarking LLMs’ Swarm intelligence

Paper page – Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection

Paper page – Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper page – FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios

Beyond ‘Love,’ The Enduring Legacy Of Robert Indiana Resonates Deeply Through Pace Gallery Representation

Ancient Greek Author and Title of Charred Herculaneum Scroll Revealed

Bonhams To Auction Museum Quality Work from The Holly Solomon Collection.

Justin Bateman Turns Stones Into Ephemeral Art

[2410.06523] Phase Diagram from Nonlinear Interaction between Superconducting Order and Density: Toward Data-Based Holographic Superconductor

Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Stanford HAI’s annual report highlights rapid adoption and growing accessibility of powerful AI systems

What's Hot

Paper page – Benchmarking LLMs’ Swarm intelligence

Related Posts

Subscribe to Updates