Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Paper page – EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

Stability AI is working on a licensing marketplace for creators

Alibaba’s Qwen-MT Promises Smarter, Cheaper Translations Across 92 Languages

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
TechCrunch AI

OpenAI’s new GPT-4.1 AI models focus on coding

By Advanced AI EditorApril 14, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


OpenAI on Monday launched a new family of models called GPT-4.1. Yes, “4.1” — as if the company’s nomenclature wasn’t confusing enough already.

There’s GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano, all of which OpenAI says “excel” at coding and instruction following. Available through OpenAI’s API but not ChatGPT, the multimodal models have a 1-million-token context window, meaning they can take in roughly 750,000 words in one go (longer than “War and Peace”).

GPT-4.1 arrives as OpenAI rivals like Google and Anthropic ratchet up efforts to build sophisticated programming models. Google’s recently released Gemini 2.5 Pro, which also has a 1-million-token context window, ranks highly on popular coding benchmarks. So do Anthropic’s Claude 3.7 Sonnet and Chinese AI startup DeepSeek’s upgraded V3.

It’s the goal of many tech giants, including OpenAI, to train AI coding models capable of performing complex software engineering tasks. OpenAI’s grand ambition is to create an “agentic software engineer,” as CFO Sarah Friar put it during a tech summit in London last month. The company asserts its future models will be able to program entire apps end-to-end, handling aspects such as quality assurance, bug testing, and documentation writing.

GPT-4.1 is a step in this direction.

“We’ve optimized GPT-4.1 for real-world use based on direct feedback to improve in areas that developers care most about: frontend coding, making fewer extraneous edits, following formats reliably, adhering to response structure and ordering, consistent tool usage, and more,” an OpenAI spokesperson told TechCrunch via email. “These improvements enable developers to build agents that are considerably better at real-world software engineering tasks.”

OpenAI claims the full GPT-4.1 model outperforms its GPT-4o and GPT-4o mini models on coding benchmarks, including SWE-bench. GPT-4.1 mini and nano are said to be more efficient and faster at the cost of some accuracy, with OpenAI saying GPT-4.1 nano is its speediest — and cheapest — model ever.

GPT-4.1 costs $2 per million input tokens and $8 per million output tokens. GPT-4.1 mini is $0.40/million input tokens and $1.60/million output tokens, and GPT-4.1 nano is $0.10/million input tokens and $0.40/million output tokens.

According to OpenAI’s internal testing, GPT-4.1, which can generate more tokens at once than GPT-4o (32,768 versus 16,384), scored between 52% and 54.6% on SWE-bench Verified, a human-validated subset of SWE-bench. (OpenAI noted in a blog post that some solutions to SWE-bench Verified problems couldn’t run on its infrastructure, hence the range of scores.) Those figures are slightly under the scores reported by Google and Anthropic for Gemini 2.5 Pro (63.8%) and Claude 3.7 Sonnet (62.3%), respectively, on the same benchmark.

In a separate evaluation, OpenAI probed GPT-4.1 using Video-MME, which is designed to measure the ability of a model to “understand” content in videos. GPT-4.1 reached a chart-topping 72% accuracy on the “long, no subtitles” video category, claims OpenAI.

While GPT-4.1 scores reasonably well on benchmarks and has a more recent “knowledge cutoff,” giving it a better frame of reference for current events (up to June 2024), it’s important to keep in mind that even some of the best models today struggle with tasks that wouldn’t trip up experts. For example, many studies have shown that code-generating models often fail to fix, and even introduce, security vulnerabilities and bugs.

OpenAI acknowledges, too, that GPT-4.1 becomes less reliable (i.e., likelier to make mistakes) the more input tokens it has to deal with. On one of the company’s own tests, OpenAI-MRCR, the model’s accuracy decreased from around 84% with 8,000 tokens to 50% with 1 million tokens. GPT-4.1 also tended to be more “literal” than GPT-4o, says the company, sometimes necessitating more specific, explicit prompts.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous Article[ML News] Geoff Hinton leaves Google | Google has NO MOAT | OpenAI down half a billion
Next Article OpenAI slashes prices for GPT-4.1, igniting AI price war among tech giants
Advanced AI Editor
  • Website

Related Posts

Google is testing a vibe-coding app called Opal

July 25, 2025

Google’s new AI feature lets you virtually try on clothes

July 25, 2025

Samsung backs a video AI startup that can analyze thousands of hours of footage

July 25, 2025
Leave A Reply

Latest Posts

Artist Loses Final Appeal in Case of Apologising for ‘Fishrot Scandal’

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Old Masters ‘Making a Comeback’ in London: Morning Links

Bill Proposed To Apply Anti-Money Laundering Regulations to Art Market

Latest Posts

Paper page – EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion

July 25, 2025

Stability AI is working on a licensing marketplace for creators

July 25, 2025

Alibaba’s Qwen-MT Promises Smarter, Cheaper Translations Across 92 Languages

July 25, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Paper page – EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion
  • Stability AI is working on a licensing marketplace for creators
  • Alibaba’s Qwen-MT Promises Smarter, Cheaper Translations Across 92 Languages
  • China sees surge in Nvidia AI chip repair businesses despite export bans
  • Kenya Among 4 Beneficiaries as Google Announces KSh 900m AI Funding

Recent Comments

  1. binance Anmeldebonus on David Patterson: Computer Architecture and Data Storage | Lex Fridman Podcast #104
  2. nude on Brain-to-voice neuroprosthesis restores naturalistic speech
  3. Dennisemupt on Local gov’t reps say they look forward to working with Thomas
  4. checkСarBig on How Cursor and Claude Are Developing AI Coding Tools Together
  5. 37Gqfff22.com on Brain-to-voice neuroprosthesis restores naturalistic speech

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.