Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples

AI referrals to top websites were up 357% year-over-year in June, reaching 1.13B

Smuggled Nvidia AI Chips Worth $1 Billion Flood Chinese Black Market Despite U.S. Export Controls

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
TechCrunch AI

A new AI coding challenge just published its first results – and they aren’t pretty

By Advanced AI EditorJuly 24, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


A new AI coding challenge has revealed its first winner — and set a new bar for AI-powered software engineers. 

On Wednesday at 5pm PST, the nonprofit Laude Institute announced the first winner of the K Prize, a multi-round AI coding challenge launched by Databricks and Perplexity co-founder Andy Konwinski. The winner was a Brazilian prompt engineer named Eduardo Rocha de Andrade, who will receive $50,000 for the prize. But more surprising than the win was his final score: he won with correct answers to just 7.5% of the questions on the test.

“We’re glad we built a benchmark that is actually hard,” said Konwinski. “Benchmarks should be hard if they’re going to matter,” he continued, adding: “Scores would be different if the big labs had entered with their biggest models. But that’s kind of the point. K Prize runs offline with limited compute, so it favors smaller and open models. I love that. It levels the playing field.”

Konwinski has pledged $1 million to the first open-source model that can score higher than 90% on the test.

Similar to the well-known SWE-Bench system, the K Prize tests models against flagged issues from GitHub as a test of how well models can deal with real-world programming problems. But while SWE-Bench is based on a fixed set of problems that models can train against, the K Prize is designed as a “contamination-free version of SWE-Bench,” using a timed entry system to guard against any benchmark-specific training. For round one, models were due by March 12th. The K Prize organizers then built the test using only GitHub issues flagged after that date.

The 7.5% top score stands in marked contrast to SWE-Bench itself, which currently shows a 75% top score on its easier ‘Verified’ test and 34% on its harder ‘Full’ test. Konwinski still isn’t sure whether the disparity is due to contamination on SWE-Bench or just the challenge of collecting new issues from GitHub, but he expects the K Prize project to answer the question soon.

“As we get more runs of the thing, we’ll have a better sense,” he told TechCrunch, “because we expect people to adapt to the dynamics of competing on this every few months.”

Techcrunch event

San Francisco
|
October 27-29, 2025

It might seem like an odd place to fall short, given the wide range of AI coding tools already publicly available – but with benchmarks becoming too easy, many critics see projects like the K Prize as a necessary step toward solving AI’s growing evaluation problem.

“I’m quite bullish about building new tests for existing benchmarks,” says Princeton researcher Sayash Kapoor, who put forward a similar idea in a recent paper. “Without such experiments, we can’t actually tell if the issue is contamination, or even just targeting the SWE-Bench leaderboard with a human in the loop.”

For Konwinski, it’s not just a better benchmark, but an open challenge to the rest of the industry. “If you listen to the hype, it’s like we should be seeing AI doctors and AI lawyers and AI software engineers, and that’s just not true,” he says. “If we can’t even get more than 10% on a contamination free SWE-Bench, that’s the reality check for me.”



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleEdimakor Unleash a Powerful Evolution of AI Video Editing
Next Article White House plan signals “open-weight first” era—and enterprises need new guardrails
Advanced AI Editor
  • Website

Related Posts

AI referrals to top websites were up 357% year-over-year in June, reaching 1.13B

July 26, 2025

Meta names Shengjia Zhao as chief scientist of AI superintelligence unit

July 25, 2025

Sam Altman warns there’s no legal confidentiality when using ChatGPT as a therapist

July 25, 2025

Comments are closed.

Latest Posts

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Anish Kapoor Lists New York Apartment for $17.75 M.

Artist Loses Final Appeal in Case of Apologising for ‘Fishrot Scandal’

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Latest Posts

New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples

July 26, 2025

AI referrals to top websites were up 357% year-over-year in June, reaching 1.13B

July 26, 2025

Smuggled Nvidia AI Chips Worth $1 Billion Flood Chinese Black Market Despite U.S. Export Controls

July 25, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples
  • AI referrals to top websites were up 357% year-over-year in June, reaching 1.13B
  • Smuggled Nvidia AI Chips Worth $1 Billion Flood Chinese Black Market Despite U.S. Export Controls
  • Claude Code AI Automations for Community Management in 2025
  • Earnings Shock: Why IBM, Chipotle, and American Airlines Tumbled—and What Comes Next

Recent Comments

  1. Janine Bethel on OpenAI research reveals that simply teaching AI a little ‘misinformation’ can turn it into an entirely unethical ‘out-of-the-way AI’
  2. 打开Binance账户 on Tanka CEO Kisson Lin to talk AI-native startups at Sessions: AI
  3. Sign up to get 100 USDT on The Do LaB On Capturing Lightning In A Bottle
  4. binance Anmeldebonus on David Patterson: Computer Architecture and Data Storage | Lex Fridman Podcast #104
  5. nude on Brain-to-voice neuroprosthesis restores naturalistic speech

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.