Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

I’m back at it: 1,000 total push-ups, pull-ups, squats every day

Inclusion Strategies in Workplace | Recruiting News Network

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » OpenAI’s Codex is part of a new cohort of agentic coding tools
OpenAI

OpenAI’s Codex is part of a new cohort of agentic coding tools

Advanced AI BotBy Advanced AI BotMay 20, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Last Friday, OpenAI introduced a new coding system called Codex, designed to perform complex programming tasks from natural language commands. Codex moves OpenAI into a new cohort of agentic coding tools that is just beginning to take shape.

From GitHub’s early Copilot to contemporary tools like Cursor and Windsurf, most AI coding assistants operate as an exceptionally intelligent form of autocomplete. The tools generally live in an integrated development environment, and users interact directly with the AI-generated code. The prospect of simply assigning a task and returning when it’s finished is largely out of reach. 

But these new agentic coding tools, led by products like Devin, SWE-Agent, OpenHands, and the aforementioned OpenAI Codex, are designed to work without users ever having to see the code. The goal is to operate like the manager of an engineering team, assigning issues through workplace systems like Asana or Slack and checking in when a solution has been reached. 

For believers in forms of highly capable AI, it’s the next logical step in a natural progression of automation taking over more and more software work.

“In the beginning, people just wrote code by pressing every single keystroke,” explains Kilian Lieret, a Princeton researcher and member of the SWE-Agent team. “GitHub Copilot was the first product that offered real auto-complete, which is kind of stage two. You’re still absolutely in the loop, but sometimes you can take a shortcut.” 

The goal for agentic systems is to move beyond developer environments entirely, instead presenting coding agents with an issue and leaving them to resolve it on their own. “We pull things back to the management layer, where I just assign a bug report and the bot tries to fix it completely autonomously,” says Lieret.

It’s an ambitious aim, and so far, it’s proven difficult.

After Devin became generally available at the end of 2024, it drew scathing criticism from YouTube pundits, as well as a more measured critique from an early client at Answer.AI. The overall impression was a familiar one for vibe-coding veterans: with so many errors, overseeing the models takes as much work as doing the task manually. (While Devin’s rollout has been a bit rocky, it hasn’t stopped fundraisers from recognizing the potential – in March, Devin’s parent company, Cognition AI, reportedly raised hundreds of millions of dollars at a $4 billion valuation.)

Even supporters of the technology caution against unsupervised vibe-coding, seeing the new coding agents as powerful elements in a human-supervised development process.

“Right now, and I would say, for the foreseeable future, a human has to step in at code review time to look at the code that’s been written,” says Robert Brennan, the CEO of All Hands AI, which maintains OpenHands. “I’ve seen several people work themselves into a mess by just auto-approving every bit of code that the agent writes. It gets out of hand fast.”

Hallucinations are an ongoing problem as well. Brennan recalls one incident in which, when asked about an API that had been released after the OpenHands agent’s training data cutoff, the agent fabricated details of an API that fit the description. All Hands AI says it’s working on systems to catch these hallucinations before they can cause harm, but there isn’t a simple fix.

Arguably the best measure of agentic programming progress is the SWE-Bench leaderboards, where developers can test their models against a set of unresolved issues from open GitHub repositories. OpenHands currently holds the top spot on the verified leaderboard, solving 65.8% of the problem set. OpenAI claims that one of the models powering Codex, codex-1, can do better, listing a 72.1% score in its announcement – although the score came with a few caveats and hasn’t been independently verified.

The concern among many in the tech industry is that high benchmark scores don’t necessarily translate to truly hands-off agentic coding. If agentic coders can only solve three out of every four problems, they’re going to require significant oversight from human developers – particularly when tackling complex systems with multiple stages.

Like most AI tools, the hope is that improvements to foundation models will come at a steady pace, eventually enabling agentic coding systems to grow into reliable developer tools. But finding ways to manage hallucinations and other reliability issues will be crucial for getting there.

“I think there is a little bit of a sound barrier effect,” Brennan says. “The question is, how much trust can you shift to the agents, so they take more out of your workload at the end of the day?”



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOpenAI launches advanced coding assistant Codex
Next Article What happens when you train your AI on AI-generated data?
Advanced AI Bot
  • Website

Related Posts

You can win $250K from OpenAI if you help solve archaeological mysteries with AI

May 20, 2025

Google I/O 2025: Google’s answer to Microsoft and OpenAI’s AI coding agents, Jules is now available for everyone to try

May 20, 2025

You can win $250K from OpenAI if you help solve archaeological mysteries with AI

May 20, 2025
Leave A Reply Cancel Reply

Latest Posts

How Exclusion Inspired Malene Barnett’s ‘Crafted Kinship,’ A Groundbreaking Book Celebrating Caribbean Makers

Calder Gardens in Philadelphia to Open September 21

Estate of Susan Rothenberg Joins Hauser & Wirth

Art Basel Qatar to Launch in February 2026

Latest Posts

I’m back at it: 1,000 total push-ups, pull-ups, squats every day

May 20, 2025

Inclusion Strategies in Workplace | Recruiting News Network

May 20, 2025

EU Commission: “AI Gigafactories” to strengthen Europe as a business location

May 20, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.