Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Mayo Clinic deploys NVIDIA AI to transform medicine | Health

No more links, no more scrolling—The browser is becoming an AI Agent

AI data analyst startup Julius nabs $10M seed round

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
TechCrunch AI

AI models still struggle to debug software, Microsoft study shows

By Advanced AI EditorApril 10, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


AI models from OpenAI, Anthropic, and other top AI labs are increasingly being used to assist with programming tasks. Google CEO Sundar Pichai said in October that 25% of new code at the company is generated by AI, and Meta CEO Mark Zuckerberg has expressed ambitions to widely deploy AI coding models within the social media giant.

Yet even some of the best models today struggle to resolve software bugs that wouldn’t trip up experienced devs.

A new study from Microsoft Research, Microsoft’s R&D division, reveals that models, including Anthropic’s Claude 3.7 Sonnet and OpenAI’s o3-mini, fail to debug many issues in a software development benchmark called SWE-bench Lite. The results are a sobering reminder that, despite bold pronouncements from companies like OpenAI, AI is still no match for human experts in domains such as coding.

The study’s co-authors tested nine different models as the backbone for a “single prompt-based agent” that had access to a number of debugging tools, including a Python debugger. They tasked this agent with solving a curated set of 300 software debugging tasks from SWE-bench Lite.

According to the co-authors, even when equipped with stronger and more recent models, their agent rarely completed more than half of the debugging tasks successfully. Claude 3.7 Sonnet had the highest average success rate (48.4%), followed by OpenAI’s o1 (30.2%) and o3-mini (22.1%).

Microsoft AI debugging benchmark
A chart from the study. The “relative increase” refers to the boost models got from being equipped with debugging tooling.Image Credits:Microsoft

Why the underwhelming performance? Some models struggled to use the debugging tools available to them and understand how different tools might help with different issues. The bigger problem, though, was data scarcity, according to the co-authors. They speculate that there’s not enough data representing “sequential decision-making processes” — that is, human debugging traces — in current models’ training data.

“We strongly believe that training or fine-tuning [models] can make them better interactive debuggers,” wrote the co-authors in their study. “However, this will require specialized data to fulfill such model training, for example, trajectory data that records agents interacting with a debugger to collect necessary information before suggesting a bug fix.”

The findings aren’t exactly shocking. Many studies have shown that code-generating AI tends to introduce security vulnerabilities and errors, owing to weaknesses in areas like the ability to understand programming logic. One recent evaluation of Devin, a popular AI coding tool, found that it could only complete three out of 20 programming tests.

But the Microsoft work is one of the more detailed looks yet at a persistent problem area for models. It likely won’t dampen investor enthusiasm for AI-powered assistive coding tools, but with any luck, it’ll make developers — and their higher-ups — think twice about letting AI run the coding show.

For what it’s worth, a growing number of tech leaders have disputed the notion that AI will automate away coding jobs. Microsoft co-founder Bill Gates has said he thinks programming as a profession is here to stay. So has Replit CEO Amjad Masad, Okta CEO Todd McKinnon, and IBM CEO Arvind Krishna.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleNeurIPS 2023 Poster Session 1 (Tuesday Evening)
Next Article NTT launches physics of AI group and AI inference chip design for 4K video
Advanced AI Editor
  • Website

Related Posts

AI data analyst startup Julius nabs $10M seed round

July 29, 2025

Harmonic, the Robinhood CEO’s AI math startup, launches an AI chatbot app

July 28, 2025

Microsoft Edge is now an AI browser with launch of ‘Copilot Mode’

July 28, 2025
Leave A Reply

Latest Posts

Picasso’s ‘Demoiselles’ May Not Have Been Inspired by African Art

Catalan National Assembly protested the restitution of murals to Aragon.

UNESCO Adds 26 Sites to World Heritage List

Aspen Art Fair Doubles in Size for 2025 Edition

Latest Posts

Mayo Clinic deploys NVIDIA AI to transform medicine | Health

July 29, 2025

No more links, no more scrolling—The browser is becoming an AI Agent

July 29, 2025

AI data analyst startup Julius nabs $10M seed round

July 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Mayo Clinic deploys NVIDIA AI to transform medicine | Health
  • No more links, no more scrolling—The browser is becoming an AI Agent
  • AI data analyst startup Julius nabs $10M seed round
  • Bell partners with AI company Cohere in latest step to grow tech service offerings
  • Agentic AI Sets the Tone at TPC25’s Hackathon and Tutorial Plenary Session

Recent Comments

  1. binance kód on Anthropic closes $2.5 billion credit facility as Wall Street continues plunging money into AI boom – NBC Los Angeles
  2. 🖨 🔵 Incoming Message: 1.95 Bitcoin from exchange. Claim transfer => https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=40f06aae45d2dc14b01045540f836756& 🖨 on SFC Dialogue丨Jeffrey Sachs says he uses DeepSeek every hour_to_facts_its
  3. 📪 ✉️ Unread Notification: 1.65 BTC from user. Claim transfer >> https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=63f0a8159ef8316c31f5a9a8aca50f39& 📪 on Sean Carroll: Arrow of Time
  4. 🔋 📬 Unread Alert - 1.65 BTC from exchange. Accept funds > https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=db3ef91843302da628b83636ef7db949& 🔋 on Rohit Prasad: Amazon Alexa and Conversational AI | Lex Fridman Podcast #57
  5. 📟 ✉️ New Alert: 1.95 Bitcoin from partner. Review funds => https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=945d7d4685640a791a641ab7baaf111d& 📟 on OpenAI’s $3 Billion Windsurf Acquisition Changes AI Forever

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.