Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Govt accepts IBM’s proposal on Quantum Computing Centre

U.S. and Indian VCs just formed a $1B+ alliance to fund India’s deep tech startups

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Andrej Karpathy

AI researcher Andrej Karpathy says he’s “bearish on reinforcement learning” for LLM training

By Advanced AI EditorAugust 30, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


summary
Summary

Andrej Karpathy, a former Tesla and OpenAI researcher, is part of a growing movement in the AI community calling for a new approach to building large language models (LLMs) and AI systems.

On X, Karpathy shared his long-term skepticism about reinforcement learning (RL) as a foundation for LLM training. He argues that RL reward functions are “super sus” – unreliable, easy to game, and not well suited for teaching “intellectual problem solving” skills.

This stands out because current “reasoning” models depend heavily on reinforcement learning, and companies like OpenAI see the approach as scalable and adaptable to new tasks. Reasoning models have powered most of the recent AI hype and progress, while purely pre-trained LLMs seem to have hit a plateau.

Reinforcement learning is often used to help LLMs break down tasks into logical steps and make their reasoning process more transparent. RL works best when there’s a clear right or wrong answer, since the model gets positive feedback for solving problems in a step-by-step way.

Ad

THE DECODER Newsletter

The most important AI news straight to your inbox.

✓ Weekly

✓ Free

✓ Cancel at any time

Despite his criticism, Karpathy still sees RL finetuning as a step up from classic supervised finetuning (SFT), which just mimics human answers. He thinks RL leads to more nuanced model behavior and believes RL finetuning will “continue to grow substantially.”

Still, Karpathy says real breakthroughs will need fundamentally different learning mechanisms. Humans, he points out, use much more powerful and efficient ways to learn—methods that “haven’t been properly invented and scaled yet.” This puts him in line with a growing group of LLM skeptics who argue that the next leap in AI will only come from new approaches.

One direction he mentions is “system prompt learning,” where learning happens at the level of tokens and context, not by changing model weights. Karpathy compares this to what happens during human sleep, when the brain consolidates and stores information.

Interactive environments as the next major training paradigm for language models

Karpathy also sees promise in training LLMs through interactive environments—digital spaces where models can act and see the consequences. Earlier training phases relied on internet text for pre-training and question-and-answer data for fine-tuning, but training in environments gives models real feedback based on what they actually do.

With this approach, LLMs could go beyond simply guessing how a person might respond and start learning to make decisions, testing how well those choices work in controlled scenarios. Karpathy says these environments could be used for both training and evaluation. The main challenge now is building a large, diverse, and high-quality set of environments, much like the text datasets used in earlier training phases.

Recommendation

AlphaEvolve is Google DeepMind's new AI system that autonomously creates better algorithms

AlphaEvolve is Google DeepMind's new AI system that autonomously creates better algorithms

Back in August 2024, Karpathy argued that reinforcement learning could be a breakthrough for LLM training—if it relied on truly objective, measurable reward functions. At the time, he criticized reinforcement learning from human feedback (RLHF), then the standard approach, for being too dependent on human preferences, calling it more of a “vibe check” than a real goal. He said that solving complex problems requires well-defined success criteria. Even as reasoning models advance, it doesn’t seem like Karpathy believes this core issue has been solved.

Karpathy’s thinking lines up with calls for a paradigm shift from DeepMind researchers Richard Sutton and David Silver in their essay “Welcome to the Era of Experience.” Both argue that the next wave of advanced AI can’t just copy human language or judgments. Instead, they say, future AI needs to become more robust, creative, and adaptable by learning directly from experience and independent action.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning – Takara TLDR
Next Article C3.ai, Inc. Securities Fraud Class Action Lawsuit Pending: Contact Levi & Korsinsky Before October 21, 2025 to Discuss Your Rights – AI
Advanced AI Editor
  • Website

Related Posts

Ex-OpenAI scientist, Andrej Karpathy, is “bearish on reinforcement learning” in the long-term

August 29, 2025

Tesla’s Vision-Only Autonomous Driving: Karpathy’s Data-Driven Bet

August 28, 2025

AI Guru Karpathy’s Programming “Magic”: Unveiling a Four – Layer Toolchain with Cursor as Main Tool and GPT

August 25, 2025

Comments are closed.

Latest Posts

80 Museum Exhibitions and Biennials to See in Fall 2025

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

Latest Posts

Govt accepts IBM’s proposal on Quantum Computing Centre

September 2, 2025

U.S. and Indian VCs just formed a $1B+ alliance to fund India’s deep tech startups

September 2, 2025

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control – Takara TLDR

September 2, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Govt accepts IBM’s proposal on Quantum Computing Centre
  • U.S. and Indian VCs just formed a $1B+ alliance to fund India’s deep tech startups
  • EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control – Takara TLDR
  • Report: OpenAI plans to build 1GW+ data center in India
  • Trump administration has backup plans for tariffs

Recent Comments

  1. Dieona8Nalay on Trump’s Tech Sanctions To Empower China, Betray America
  2. Dieona8Nalay on TEFAF New York Illuminates Art Week With Mastery Of Vivid, Radiant Color
  3. با رتبه ۳۵۰۰۰ تجربی چی قبول میشم on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Dieona8Nalay on Jony Ive is building a futuristic AI device and OpenAI may acquire it
  5. Dieona8Nalay on Sam & Jony introduce io

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.