July 5, 2025, 12:24 am IDT
Updated: July 5, 2025, 1:38 am IDT
Chollet and his team are building a “lifelong-learning, self-improving, DL-guided program synthesis engine”
Intelligence is a process, not a skill. Attributing intelligence to a crystallized behavior program is a category error, a fundamental misunderstanding that has guided, and misguided, the field of artificial intelligence for years. This was the central thesis from François Chollet, the creator of Keras, as he laid out a new path toward Artificial General Intelligence (AGI).
Speaking at the AI Startup School in San Francisco, Chollet presented a sharp critique of the industry’s recent obsession with scaling large language models. He argued that while the “pre-training scaling era” from 2020 to 2024 crushed benchmarks, it led the community to confuse performance on known tasks with genuine intelligence. The industry, he explained, became fixated on the idea that simply cramming more data into bigger models would spontaneously generate AGI. This approach, however, produces systems that are masters of automation, not invention.
Chollet’s argument hinges on a critical distinction. He contrasts static, memorized skills with fluid intelligence—the ability to adapt and solve novel problems on the fly. To prove the limits of scaling, he pointed to his own creation, the Abstraction and Reasoning Corpus (ARC). Despite a 50,000x scale-up in base LLMs from GPT-2 to GPT-4.5, performance on ARC-1, a test of pure fluid intelligence, barely improved from zero to ten percent. This demonstrated that “fluid intelligence does not emerge from pre-training scaling alone.” Models were becoming phenomenal memorization engines, but they were not learning to reason.
The field has begun to recognize this limitation. A significant paradigm shift occurred in 2024 towards Test-Time Adaptation (TTA), where models are designed to learn and adapt during inference. This new approach, which includes techniques like test-time training and symbolic program synthesis, has finally started to show meaningful progress on the ARC benchmark. Instead of merely retrieving pre-loaded knowledge, these systems are demonstrating a nascent ability to learn, a core component of true intelligence.
This defines the shortcut rule: you achieve what you target, at the expense of everything else. By targeting exam-style benchmarks, the field built powerful memorization engines; by targeting fluid intelligence, we may finally build innovation engines.
To build AGI, Chollet argues, we must fuse two types of abstraction. The first is the intuitive, perception-based abstraction that deep learning excels at. The second is the symbolic, program-like reasoning that allows for rigorous, step-by-step problem-solving. At his new research lab, Ndea, Chollet and his team are building a “lifelong-learning, self-improving, DL-guided program synthesis engine” designed to bridge these two worlds. The system is architected as a meta-learner that develops a global library of abstract subroutines, effectively learning how to learn over time. This approach moves away from building static artifacts and toward creating systems that are in a constant state of becoming.