“This isn’t the first time we’ve seen outcomes that suggest some of the ambitious ideas about AI agents directly replacing developers are pretty far from reality. There have been numerous studies already showing that even though an AI tool can sometimes create an application that seems acceptable to the user for a narrow task, the models tend to produce code laden with bugs and security vulnerabilities, and they aren’t generally capable of fixing those problems…. most researchers agree it remains likely that the best outcome is an agent that saves a human developer a substantial amount of time, not one that can do everything they can do.”
“those claiming we’re mere months away from AI agents replacing most programmers should adjust their expectations because models aren’t good enough at the debugging part, and debugging occupies most of a developer’s time”
These quotes are from a new essay at arstechnica, looking at a new Microsoft study on troubles in getting AI to debug with reliability confirming one of the core claims in my recent critique of Hard Fork’s Kevin Roose on vibe coding: debugging is hard, and a big part of what coders do, and not about to be replaced.

Another recent quote that is quite relevant is from Sir Demis Hassabis, on agents in general, but applicable to fantasies about vibe coding agents, too, “If your Al model has a 1% error rate and you plan over 5,000 steps, that 1% compounds like compound interest.”
The only way we are going to get past this kind of 80:20, sometimes it works, sometimes it doesn’t kind of AI is to change the paradigm.
Gary Marcus is sorry to have to repeat himself. But the big change we need still hasn’t come.