Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Feds Sign AI Agreement with Cohere to Modernize Public Services

Google and NASA Pilot an AI Medical Assistant for Deep Space

4DNeX: Feed-Forward 4D Generative Modeling Made Easy – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
François Chollet

Evolving Models And Games: Are We Near AGI?

By Advanced AI EditorAugust 19, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


download (2)

One of the best ways to evaluate an AI model is to put it to the test on problems that stymie skilled or experienced humans.

We see this with math data sets made for Ph.D.s, and other data sets showing off a model’s reasoning capabilities. But there’s another way to check progress with LLMs as well.

It’s called ARC AGI, and it’s been around since 2019, when Francois Chollet came up with ARC AGI or Abstraction and Reasoning Corpus for Artificial General Intelligence. The ARC AGI benchmark measures general (“fluid”) intelligence—the ability to reason, adapt, and solve novel problems efficiently—rather than memorizing or using domain-specific knowledge. It’s attributed to Chollet in his 2019 paper “On the Measure of Intelligence,” and since then, it’s been a gold standard for figuring out how good machines are at solving abstract problems.

There are also further iterations of ARC AGI still in the works: after releasing ARC AGI 2 earlier this year, Chollet and team are now working on a new set of tasks called ARC AGI 3.

The ARC AGI 3 set is different from what came before. It consists of small games, without instructions.

The human or AI agents who use this set are supposed to intuit what they are required to do by the game, just from playing around with it visually. You can try some of these little cryptic games online at the web site, and sure enough, although there are difficulties, with a little work, you should be able to figure out what the game wants you to do.

Human head with a light bulb. Concept of creative thinking, idea, innovation, solution.

getty

Metrics with OpenAI Models

Here’s the kicker: the new OpenAI model, o3, has scored very high on the ARC AGI 1 set. But it has not scored high on the ARC AGI 2 set, where it has as of yet only accomplished a 3%, against something like 85% on set 1. That higher mark is widely seen as another step toward artificial general intelligence or AGI. But it’s confusing. The o3 model did not score 85% on ARC AGI 2, and hasn’t even been tested on set 3, which is still in development.

That’s important to know, because when the AI tests high on ARC AGI 3, we’re much further toward the singularity, or AGI.

Chollet Weighs In

In a explainer video totaling some 35 minutes, Chollet takes the stage to talk about his creation of the original ARC AGI set, what’s happened since then, and a lot of the theory behind the test set, as well as overall context. There’s far too much in his talk for me to go over, but for example, Chollet goes over the difference between two definitions of artificial intelligence: one attributed to Marvin Minsky (which I talk about a lot) that characterizes AI as the mimicry of human brains, and another one espoused by someone named John McCarthy, arguing that AI progress essentially involves machines being able to adapt to new realities and previously unknown tasks.

“There’s a big difference between memorized skills, which are static and task specific, and fluid general intelligence, the ability to understand something you’ve never seen before, on the fly,” he says.

He also references the techniques used by these new models:

“In 2025, we have suddenly moved on from the pre-training scaling part,” Chollet adds. “I mean, we’re now fully in the era of test adaptation. The test adaptation is all about the ability of the model to modify its own behavior, dynamically, based on the specific data it encounters during inference. That covers techniques like test time training, program synthesis, chain of thought synthesis, where the model tries to reprogram itself for the task at hand. And today, every single AI approach that performs well on arc is using one of these techniques.”

In a metaphor approach, Chollet asks us to imagine two things: a network of roads, and a road-building company. If you have a road network, he points out, you can go various places. If you have a road building company, you can create new roads, new routes to go from point A, to point B, and so on.

“Don’t confuse the roads, and the process that created the roads,” he says.

The Agentic Contest

As with prior benchmarks, developers will be bringing new approaches to the table, hoping to score well on challenges like ARC AGI 3. ARC AGI also gives out prizes for competitive results. That’s part of the attention that this corner of the internet gets, as those close to the process try to achieve better scores with a particular AI engine.

In that context, I think we should realize what we’re looking at here. For anyone who is concerned about AGI, ARC gives us a way to test, to see how far we are on this journey. Perhaps by 2027, models will be able to score highly on set 3, or perhaps not. Everything is happening very, very quickly. Stay tuned.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleCanada Teams With Cohere to Boost AI Innovation
Next Article Layoffs increase from prior year
Advanced AI Editor
  • Website

Related Posts

Farewell and thank you for the continued partnership, Francois Chollet!

August 11, 2025

New ARC-AGI-3 benchmark shows that humans still outperform LLMs at pretty basic thinking

July 20, 2025

François Chollet on Why Scaling Is Not the Path to AGI

July 4, 2025

Comments are closed.

Latest Posts

Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised

Senator Seeks Investigation into Jeffrey Epstein’s Work for Leon Black

Spike Lee’s ‘Highest 2 Lowest’ Features Art From His Own Collection

MacDowell’s Chiwoniso Kaitano Wants to Center Artist Residencies

Latest Posts

Feds Sign AI Agreement with Cohere to Modernize Public Services

August 19, 2025

Google and NASA Pilot an AI Medical Assistant for Deep Space

August 19, 2025

4DNeX: Feed-Forward 4D Generative Modeling Made Easy – Takara TLDR

August 19, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Feds Sign AI Agreement with Cohere to Modernize Public Services
  • Google and NASA Pilot an AI Medical Assistant for Deep Space
  • 4DNeX: Feed-Forward 4D Generative Modeling Made Easy – Takara TLDR
  • Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised
  • Simplify access control and auditing for Amazon SageMaker Studio using trusted identity propagation

Recent Comments

  1. Charleserart on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. JustinSoits on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. JustinSoits on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Martinhon on C3 AI and Arcfield Announce Partnership to Accelerate AI Capabilities to Serve U.S. Defense and Intelligence Communities
  5. Waynehax on C3 AI and Arcfield Announce Partnership to Accelerate AI Capabilities to Serve U.S. Defense and Intelligence Communities

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.