Paper Page - SWE-Perf: Can Language Models Optimize Code Performance On Real-World Repositories?

SWE-Perf is a benchmark for evaluating Large Language Models in code performance optimization using real-world repository data.

Code performance optimization is paramount in real-world software engineering
and critical for production-level systems. While Large Language Models (LLMs)
have demonstrated impressive capabilities in code generation and bug fixing,
their proficiency in enhancing code performance at the repository level remains
largely unexplored. To address this gap, we introduce SWE-Perf, the first
benchmark specifically designed to systematically evaluate LLMs on code
performance optimization tasks within authentic repository contexts. SWE-Perf
comprises 140 carefully curated instances, each derived from
performance-improving pull requests from popular GitHub repositories. Each
benchmark instance includes the relevant codebase, target functions,
performance-related tests, expert-authored patches, and executable
environments. Through a comprehensive evaluation of representative methods that
span file-level and repo-level approaches (e.g., Agentless and OpenHands), we
reveal a substantial capability gap between existing LLMs and expert-level
optimization performance, highlighting critical research opportunities in this
emerging field.

Source link

What's Hot

Build Hour: Agentic Tool Calling

These psychological tricks can get LLMs to respond to “forbidden” prompts

I Tried Perplexity’s Comet AI Web Browser and It Might Be the Future

Paper page – SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Discrete Noise Inversion for Next-scale Autoregressive Text-based Image Editing – Takara TLDR

MedDINOv3: How to adapt vision foundation models for medical image segmentation? – Takara TLDR

FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games – Takara TLDR

Nazi-Looted Painting from Argentine Home May Have Been Recovered

Armory Show to ‘Complicate Stereotypes,’ and More Art News

Search for Nazi-Looted Art Leads to House Arrest Order in Argentina

Louvre Ends Nintendo 3DS Museum Guide Partnership After Over A Decade

Build Hour: Agentic Tool Calling

These psychological tricks can get LLMs to respond to “forbidden” prompts

I Tried Perplexity’s Comet AI Web Browser and It Might Be the Future

What's Hot

Paper page – SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Related Posts

Subscribe to Updates