Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

How Nippon India Mutual Fund improved the accuracy of AI assistant responses using advanced RAG methods on Amazon Bedrock

Stability AI is developing a legit ‘opt in’ artist marketplace… amid lawsuits over unauthorized scraping

NTT Data and Mistral AI join forces on managed sovereign enterprise AI

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
OpenAI

OpenAI’s Codex is part of a new cohort of agentic coding tools

By Advanced AI EditorMay 20, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Last Friday, OpenAI introduced a new coding system called Codex, designed to perform complex programming tasks from natural language commands. Codex moves OpenAI into a new cohort of agentic coding tools that is just beginning to take shape.

From GitHub’s early Copilot to contemporary tools like Cursor and Windsurf, most AI coding assistants operate as an exceptionally intelligent form of autocomplete. The tools generally live in an integrated development environment, and users interact directly with the AI-generated code. The prospect of simply assigning a task and returning when it’s finished is largely out of reach. 

But these new agentic coding tools, led by products like Devin, SWE-Agent, OpenHands, and the aforementioned OpenAI Codex, are designed to work without users ever having to see the code. The goal is to operate like the manager of an engineering team, assigning issues through workplace systems like Asana or Slack and checking in when a solution has been reached. 

For believers in forms of highly capable AI, it’s the next logical step in a natural progression of automation taking over more and more software work.

“In the beginning, people just wrote code by pressing every single keystroke,” explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. “GitHub Copilot was the first product that offered real auto-complete, which is kind of stage two. You’re still absolutely in the loop, but sometimes you can take a shortcut.” 

The goal for agentic systems is to move beyond developer environments entirely, instead presenting coding agents with an issue and leaving them to resolve it on their own. “We pull things back to the management layer, where I just assign a bug report and the bot tries to fix it completely autonomously,” says Lieret.

It’s an ambitious aim, and so far, it’s proven difficult.

After Devin became generally available at the end of 2024, it drew scathing criticism from YouTube pundits, as well as a more measured critique from an early client at Answer.AI. The overall impression was a familiar one for vibe-coding veterans: with so many errors, overseeing the models takes as much work as doing the task manually. (While Devin’s rollout has been a bit rocky, it hasn’t stopped fundraisers from recognizing the potential – in March, Devin’s parent company, Cognition AI, reportedly raised hundreds of millions of dollars at a $4 billion valuation.)

Even supporters of the technology caution against unsupervised vibe-coding, seeing the new coding agents as powerful elements in a human-supervised development process.

“Right now, and I would say, for the foreseeable future, a human has to step in at code review time to look at the code that’s been written,” says Robert Brennan, the CEO of All Hands AI, which maintains OpenHands. “I’ve seen several people work themselves into a mess by just auto-approving every bit of code that the agent writes. It gets out of hand fast.”

Hallucinations are an ongoing problem as well. Brennan recalls one incident in which, when asked about an API that had been released after the OpenHands agent’s training data cutoff, the agent fabricated details of an API that fit the description. All Hands AI says it’s working on systems to catch these hallucinations before they can cause harm, but there isn’t a simple fix.

Arguably the best measure of agentic programming progress is the SWE-Bench leaderboards, where developers can test their models against a set of unresolved issues from open GitHub repositories. OpenHands currently holds the top spot on the verified leaderboard, solving 65.8% of the problem set. OpenAI claims that one of the models powering Codex, codex-1, can do better, listing a 72.1% score in its announcement – although the score came with a few caveats and hasn’t been independently verified.

The concern among many in the tech industry is that high benchmark scores don’t necessarily translate to truly hands-off agentic coding. If agentic coders can only solve three out of every four problems, they’re going to require significant oversight from human developers – particularly when tackling complex systems with multiple stages.

Like most AI tools, the hope is that improvements to foundation models will come at a steady pace, eventually enabling agentic coding systems to grow into reliable developer tools. But finding ways to manage hallucinations and other reliability issues will be crucial for getting there.

“I think there is a little bit of a sound barrier effect,” Brennan says. “The question is, how much trust can you shift to the agents, so they take more out of your workload at the end of the day?”



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOpenAI launches advanced coding assistant Codex
Next Article What happens when you train your AI on AI-generated data?
Advanced AI Editor
  • Website

Related Posts

OpenAI’s GPT-5 Is Coming Next Month

July 29, 2025

OpenAI prepares GPT-5 for roll out

July 29, 2025

Don’t Use ChatGPT As Your Therapist, OpenAI CEO Warns

July 28, 2025
Leave A Reply

Latest Posts

John Roberts Prevented Firing of National Portrait Gallery Director

Betye Saar Assembles an All-Star Group to Steward Her Legacy

Picasso’s ‘Demoiselles’ May Not Have Been Inspired by African Art

Catalan National Assembly protested the restitution of murals to Aragon.

Latest Posts

How Nippon India Mutual Fund improved the accuracy of AI assistant responses using advanced RAG methods on Amazon Bedrock

July 29, 2025

Stability AI is developing a legit ‘opt in’ artist marketplace… amid lawsuits over unauthorized scraping

July 29, 2025

NTT Data and Mistral AI join forces on managed sovereign enterprise AI

July 29, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • How Nippon India Mutual Fund improved the accuracy of AI assistant responses using advanced RAG methods on Amazon Bedrock
  • Stability AI is developing a legit ‘opt in’ artist marketplace… amid lawsuits over unauthorized scraping
  • NTT Data and Mistral AI join forces on managed sovereign enterprise AI
  • China’s ‘AI ambition’ shines at WAIC; ‘DeepSeek Moment’ cycle significantly shortened: expert
  • OpenAI’s GPT-5 Is Coming Next Month

Recent Comments

  1. binance kód on Anthropic closes $2.5 billion credit facility as Wall Street continues plunging money into AI boom – NBC Los Angeles
  2. 🖨 🔵 Incoming Message: 1.95 Bitcoin from exchange. Claim transfer => https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=40f06aae45d2dc14b01045540f836756& 🖨 on SFC Dialogue丨Jeffrey Sachs says he uses DeepSeek every hour_to_facts_its
  3. 📪 ✉️ Unread Notification: 1.65 BTC from user. Claim transfer >> https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=63f0a8159ef8316c31f5a9a8aca50f39& 📪 on Sean Carroll: Arrow of Time
  4. 🔋 📬 Unread Alert - 1.65 BTC from exchange. Accept funds > https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=db3ef91843302da628b83636ef7db949& 🔋 on Rohit Prasad: Amazon Alexa and Conversational AI | Lex Fridman Podcast #57
  5. 📟 ✉️ New Alert: 1.95 Bitcoin from partner. Review funds => https://graph.org/ACTIVATE-BTC-TRANSFER-07-23?hs=945d7d4685640a791a641ab7baaf111d& 📟 on OpenAI’s $3 Billion Windsurf Acquisition Changes AI Forever

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.