Why OpenAI’s Codex Is Not As Good As Devin Or Replit

If you’re a software engineer, indie hacker, or startup founder who’s spent the last year tooling around with AI agents like Replit’s Ghostwriter, Cognition’s Devin, or Lovable’s smart terminals—well, OpenAI just entered the game, again.

Over the weekend, OpenAI rolled out Codex, a cloud-based software engineering agent that looks suspiciously like the future of dev work. It’s available starting for ChatGPT Pro, Team, and Enterprise users at $200 a month, while it may take a while for the Plus users to get access.

Greg Brockman, co-founder of OpenAI, said during the live research preview that Codex is their bet on vibe coding. This comes just days after OpenAI announced its acquisition of Windsurf for $3 billion. Windsurf, an artificial intelligence-assisted coding tool formerly known as Codeium, is also a direct competitor to Cursor, which was also backed by OpenAI.

Build Reliable GenAI Interactions Consistent, Accurate and Predictable Models
Get Demo

OpenAI is Vibing, But Without Internet

Codex isn’t another glorified autocomplete. It’s a multi-agent dev assistant that runs coding tasks in parallel, inside sandboxed environments preloaded with your repo, which sounds similar to Devin, but OpenAI argues that it’s not.

During the launch preview with Brockman, Katy Shi, one of the researchers at OpenAI, said, “Codex is as trustworthy, if not more trustworthy than my coworkers.” Shi added that she could access her coworkers’ logs without needing to talk to them.

Shi meant that with Codex, developers can do work like writing new features, debugging, writing tests, or proposing pull requests—and it will do all of that while showing you terminal logs, test outputs, and commit history, so you don’t have to trust it blindly.

This essentially means GitHub PRs can be drafted, tested, and explained by a bot that lives inside ChatGPT, making it possibly better than Devin.

But while Codex acts as an agent running coding tasks in the background on the cloud, Replit allows developers to deploy apps, while Devin is an end-to-end software engineer.

Codex still has other limitations, and in this case, pretty big ones. It is not connected to the internet, which makes it not an ideal choice over Devin. This is the biggest criticism currently of the release and the reason developers are not adopting it in their workflow. Devin is also in early access.

It also needs well-scoped tasks. It sometimes fails tests or gets confused. And it won’t yet handle sprawling architectural decisions on its own. But for repeatable engineering chores, it’s surprisingly capable—and transparent.

OpenAI conveniently calls this a research preview. Maybe the team will connect it to the Internet soon. The ambitions are anything but modest.

Codex is powered by codex-1, a variant of OpenAI’s o3 model explicitly tuned for software engineering. It was trained with reinforcement learning on thousands of real coding tasks, making it eerily good at mimicking human dev styles, coding conventions, and PR etiquette.

Devin, Cursor, Replit—Watch Your Backs

“Codex increases the value of being technical. If you can describe precisely what you want to build, you can get a massive amount done in parallel,” posted Josh Tobin from OpenAI. “That’s fundamentally a technical skill.”

But Cognition recently announced an update to Devin, offering a new agent-native IDE experience. Devin 2.0 supports multiple parallel instances, each with an interactive cloud-based IDE.

Additionally, the latest update allows developers to take control while providing collaborative and fully automated approaches. Furthermore, it enables developers to refine code and run tests within the IDE.

Cognition AI also announced additional features for Devin, including Interactive Planning, Devin Search, and Devin Wiki. This is where OpenAI’s Codex falls behind.

Inside ChatGPT, Codex is accessed via a sidebar. You create tasks with prompts, click “Code” to generate changes, or “Ask” to query your codebase. Very different from Cursor’s “tab tab tab” models, but similar to Lovable and Replit.

Each task gets its own isolated environment, where Codex can edit files, run linters, test harnesses, and type checkers. Depending on the complexity, completing a task can take anywhere from 1 to 30 minutes. You can monitor its progress in real time.

It’s no coincidence that Codex seems to be eager to eat the lunches of agents like Devin, Cursor, and Replit’s AI tools. All these startups have been vying to become the default AI coding companion. But with Codex, OpenAI is using its distribution advantage—ChatGPT is already in millions of developers’ workflows.

As Santiago Valdarrama joked: “Literally everyone is freaking out over Codex like they didn’t do the exact same thing for Devin, Cursor, DeepSeek, and every GPT drop since 2.0… VCs will congratulate themselves and write posts about how Codex will enable the next trillion-dollar market… until the next shitty autocomplete drops.”

Codex is Good Enough for Now

Despite the sarcasm, there’s truth to the cycle. But Codex is not autocomplete. At OpenAI itself, engineers are using Codex to offload annoying chores like renaming variables, writing tests, and fixing bugs. “By reducing context-switching and surfacing forgotten to-dos, Codex helps engineers ship faster and stay focused on what matters most,” the company writes.

Codex isn’t being built in a vacuum. Early testers like Cisco, Temporal, Superhuman, and Kodiak Robotics are already using it.

Cisco is testing it across its engineering teams to accelerate product development. Temporal uses it to debug, scaffold features, and stay in flow by offloading background work.

Superhuman has even let product managers use Codex to write code, with engineers stepping in only for reviews. Kodiak, which builds autonomous driving tech, is using it to improve test coverage and debug tools and apparently to navigate obscure parts of its stack.

Codex isn’t just stuck in ChatGPT either. OpenAI quietly launched Codex CLI last month—a terminal-based coding agent you can run locally. It brings the same models (o3 and o4-mini) into your dev environment.

Now, they’ve added codex-mini-latest, a lightweight version of codex-1 optimised for snappier Q&A and faster editing inside the CLI. OpenAI is handing out $5–$50 in free API credits for Codex CLI for Plus and Pro users. No excuses not to try it.

“We imagine a future where developers drive the work they want to own and delegate the rest to agents,” OpenAI wrote. Developers need to know what you want to build, but you may never have to write boilerplate again.

Codex doesn’t kill Replit, Devin, or Lovable overnight. But it does something much more dangerous—it sets a new standard, but without the internet. Multi-agent, cloud-based, verifiable, and integrated into ChatGPT.

It’s the baseline now. Everyone else needs to catch up.

Source link

What's Hot

Nvidia Earnings: Live Updates and Commentary August 2025

Defence’s ERP bill with IBM hits $575m

Developers lose focus 1,200 times a day — how MCP could change that

Why OpenAI’s Codex is Not as Good as Devin or Replit

OpenAI warns against fake investment opportunities in the company

OpenAI deal could bring ChatGPT Plus to an entire country

OpenAI CEO Sam Altman Believes We’re in an AI Bubble

Mütter Museum in Philadelphia Announces New Policy for Human Remains

Inigo Philbrick, Art Dealer Convicted of Fraud, Appears in BBC Film

Links for August 22, 2025

White House Targets Specific Artworks at Smithsonian Museums