Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Inhouse Day Preview – Artificial Lawyer

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling – Takara TLDR

What’s Going On With ASML Holding Stock Tuesday? – ASML Holding (NASDAQ:ASML)

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
MIT News

Lifelong Learning Large Models, the Razor Principle is Key_new_to_models

By Advanced AI EditorSeptember 10, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


We are in an era of booming large models, with various industries eager to leverage their powerful capabilities. However, how to free these models from the constraints of ‘static’ learning and achieve lifelong learninghas become a necessary path toward achieving true AGI. Recently, researchers from the Massachusetts Institute of Technology (MIT) Improbable AILab published a research paper on arXivfocusing on the issue of catastrophic forgettingin post-training of large models, proposing a thought-provoking viewpoint: the Occam’s razor principlemay be key to solving this problem.

SFT vs. RL: Who is Better at Knowledge Retention?

The core of the research lies in comparing two common post-training methods: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). Surprisingly, even when SFT and RL perform similarly on new tasks, SFT often sacrifices old knowledge to improve on new tasks, while RL can learn new skills while better retaining existing capabilities. The researchers summarized this phenomenon as the ‘forgetting law’: the greater the difference between the fine-tuned model and the original model in the distribution of new tasks, the more severe the forgetting. They found that a key metric for measuring this difference is the KL divergence. Specifically, when a model is fine-tuned on a new task, the extent of forgetting can be predicted by the KL divergence between the fine-tuning policy and the baseline policy. More importantly, RL tends to choose solutions with smaller KL divergence, i.e., solutions closer to the original model, making RL less prone to forgetting than SFT.

RL’s ‘Razor’: The KL Minimum Path Principle

The researchers attributed RL’s advantages to its ‘KL preference’. In new tasks, there are many solutions that can achieve high performance. RL naturally prefers to select solutions that are closer to the original model (with smaller KL divergence); while SFT may converge to solutions that are far from the original model, resulting in severe forgetting. The core theoretical contribution is ‘RL’s razor’—that among all methods for solving new tasks, RL prefers solutions that are closest to the original model in terms of KL divergence. To validate the KL hypothesis, the researchers constructed an ideal ‘oracle SFT’ distribution: it achieves perfect accuracy on new tasks while minimizing KL. The results showed that training on this distribution resulted in even less forgetting than RL. This indicates that RL’s advantage does not stem from some ‘essential difference’, but from its implicit execution of KL minimization. As long as the training process leans toward KL minimum solutions, model forgetting will decrease.

The Advantages of Online Policy Learning and Future Outlook

To understand what mechanisms drive RL’s KL-conservative behavior, the researchers compared four different training paradigms. The analysis revealed that the online policy nature of data collection is a key factor, rather than the use of negative examples. Online policy methods maintain smaller KL shifts and better prior task retention, while offline methods behave similarly regardless of whether negative examples are used. This research provides a new perspective on post-training: to achieve continuous adaptation without forgetting, algorithms should explicitly aim to minimize KL divergence with the base model, establishing KL divergence as a fundamental design principle for continuous learning systems. This research opens the door to designing future training methods that combine RL’s ability to retain prior knowledge with the efficiency of SFT, allowing foundational models to truly ‘learn for a lifetime’.

For practitioners using foundational models, this research offers clear guidance: when continuous adaptation is important, online policy RLmethods have a significant advantage over standard fine-tuning methods. The KL divergencemetric also provides a practical tool for monitoring and predicting forgetting during model adaptation. This work helps us understand why common practices like KL regularization in RLHF are effective, elevating empirical observations to a theoretical foundation.

How do you think we can better balance the acquisition of new knowledge and the retention of old knowledge on the path to achieving general artificial intelligence?

返回搜狐,查看更多

平台声明:该文观点仅代表作者本人,搜狐号系信息发布平台,搜狐仅提供信息存储空间服务。



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleTencent Hunyuan Releases and Open Sources Image Model 2.1, Supporting Native 2K Images_the_model_being
Next Article Google’s AI model just nailed the forecast for the strongest Atlantic storm this year
Advanced AI Editor
  • Website

Related Posts

MIT professor’s AI tools to predict breast and lung cancer risk get her recognized in TIME

September 9, 2025

MIT sees ‘significant new financial pressures’ from Trump cuts

September 9, 2025

MIT Sloan Management Review Research Points to New R&D Framework in Light of Restrictive Immigration Policies

September 9, 2025

Comments are closed.

Latest Posts

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

Latest Posts

Inhouse Day Preview – Artificial Lawyer

September 10, 2025

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling – Takara TLDR

September 10, 2025

What’s Going On With ASML Holding Stock Tuesday? – ASML Holding (NASDAQ:ASML)

September 10, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Inhouse Day Preview – Artificial Lawyer
  • Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling – Takara TLDR
  • What’s Going On With ASML Holding Stock Tuesday? – ASML Holding (NASDAQ:ASML)
  • UAE Releases ‘Fastest Inference Model’ Named Kimi, Based on Alibaba’s Qwen and Utilizing the World’s Largest Chip_Cheng_model_Things
  • Dutch chipmaker is investing $1.5B in French AI firm

Recent Comments

  1. Juniorfar on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. نمونه تکمیل شده فرم اطلاعات فردی گزینش on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. نتایج کنکور فنی حرفه ای ۱۴۰۴ on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. Brianesomy on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Spravkihrk on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.