AI Isn’t Ready To Replace Human Coders For Debugging, Researchers Say

A graph showing agents with tools nearly doubling the success rates of those without, but still achieving a success score under 50 percent — Agents using debugging tools drastically outperformed those that didn’t, but their success rate still wasn’t high enough.

Credit:

Microsoft Research

This approach is much more successful than relying on the models as they’re usually used, but when your best case is a 48.4 percent success rate, you’re not ready for primetime. The limitations are likely because the models don’t fully understand how to best use the tools, and because their current training data is not tailored to this use case.

“We believe this is due to the scarcity of data representing sequential decision-making behavior (e.g., debugging traces) in the current LLM training corpus,” the blog post says. “However, the significant performance improvement… validates that this is a promising research direction.”

This initial report is just the start of the efforts, the post claims. The next step is to “fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs.” If the model is large, the best move to save inference costs may be to “build a smaller info-seeking model that can provide relevant information to the larger one.”

This isn’t the first time we’ve seen outcomes that suggest some of the ambitious ideas about AI agents directly replacing developers are pretty far from reality. There have been numerous studies already showing that even though an AI tool can sometimes create an application that seems acceptable to the user for a narrow task, the models tend to produce code laden with bugs and security vulnerabilities, and they aren’t generally capable of fixing those problems.

This is an early step on the path to AI coding agents, but most researchers agree it remains likely that the best outcome is an agent that saves a human developer a substantial amount of time, not one that can do everything they can do.

Source link

What's Hot

ASML Invests 1.3B Euro in Mistral AI: Will it Deliver Growth?

C3.ai’s Q1 Margin Squeeze Raises Questions on Path to Profitability

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent – Takara TLDR

AI isn’t ready to replace human coders for debugging, researchers say

IBM Fires 8,000 Employees and Replaced Them With AI, Only to Rehire Just as Many Shortly After for Jobs…

An IBM Executive Shares Her Go-to Interview Question

Free web development courses from SWAYAM, IBM & more | Education News

St. Patrick’s Cathedral Unveils Monumental Mural by Adam Cvijanovic

Three Loaned Banksy Works Incite Dispute Between England and Italy

New Collectors Drive Strong Sales at New York Fair

Hidden Portrait May Be Vermeer’s Earliest Known Work

ASML Invests 1.3B Euro in Mistral AI: Will it Deliver Growth?

C3.ai’s Q1 Margin Squeeze Raises Questions on Path to Profitability

BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent – Takara TLDR

What's Hot

AI isn’t ready to replace human coders for debugging, researchers say

Related Posts

Subscribe to Updates