Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

$750 Target Stays as Analysts Expect AI Gaps to Close

A.I. May Be the Future, but First It Has to Study Ancient Roman History

OpenAI CEO Sam Altman issues big warning for ChatGPT users: Here are all the details – Technology News

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Industry AI
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Industry Applications

New Study Warns of Catastrophic Overtraining in Large Language Models

By Advanced AI EditorApril 13, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Source: Shutterstock

The race to build ever-larger language models is being driven by the assumption that more pre-training data equals better performance. It’s no surprise that AI companies have been scrambling to find enough quality data to train their AI models, often resorting to creating synthetic data to build and fine-tune the AI models. But what if this core assumption is flawed? 

A new study warns that more pre-training data may not always lead to better AI models. Researchers from high-ranking universities including Carnegie Mellon University, Stanford University, Harvard University, and Princeton University highlight the phenomenon of “Catastrophic Overtraining.” Their recent research on this matter suggests that extending pre-training can actually degrade a model’s ability to be fine-tuned effectively, leading to poorer performance in real-world applications. 

The researchers challenge the “more is better” belief when it comes to training AI models. “Contrary to common belief, longer pre-training does not always lead to better post-trained models,” wrote the authors in their study published on arXiv. “We have shown that this is a consequence of a broader underlying phenomenon where models become more sensitive to perturbations as they are pre-trained on more tokens.”

Why do AI models require pre-training? AI companies use pre-training to teach AI systems foundational skills relevant to their tasks. This could be anything from understanding language, analyzing images, predicting sequences, or recognizing patterns in data. 

Pre-training plays an important role as it allows models to generalize knowledge, adapt to diverse contexts, and perform effectively across a wide range of tasks. Just to be clear, the researchers don’t reject pre-training but suggest developers need to be more strategic about how much pre-training is enough.

To understand how the pre-training would impact AI models, the researchers compared two versions of Ai2’s open-source OLMo-1B model. One was trained on 2.3 trillion tokens, while the other on 3 trillion trillion tokens. Surprisingly, the model trained on more data performed worse after fine-tuning. It showed 2-3% lower accuracy on standard benchmarks like ARC-Challenge, PIQA, and AlpacaEval. 

The authors explain this degradation in performance through what they call “progressive sensitivity”. As the models are trained for longer, their internal parameters become increasingly sensitive to changes such as tweaking the model during fine-tuning or adding more data. This heightened sensitivity means that even minor adjustments or even small amounts of noise in the data can seriously disrupt what the model has already learned. 

The study supports its findings through evidence from multiple angles. When the researchers added Gaussian noise to pre-trained models, they found performance became significantly worse with increasing pre-training tokens. Additionally, they validated their results using a different setup involving fine-tuned benchmarks, which yielded similar outcomes.

The researchers admit that their research is not universal as their research suggests that the risk of catastrophic overtraining is higher on smaller models. They also emphasize that overtraining can’t always be fixed, even with good techniques, if the tasks aren’t well-aligned.

Source: Shutterstock

“Catastrophic overtraining may be inevitable, even if the fine-tuning process is regularized, especially when the pre-training and fine-tuning tasks are misaligned,” shared the researchers. This highlights the importance of ensuring alignment between training and fine-tuning objectives.

AI model pre-training is a crucial component of the development process. However, the study’s findings highlight the risks of overtraining. So, what is the sweet spot? According to the researchers, it involves striking a balance between base model quality and post-training adaptability.

Developers may need to rethink the approach to building AI models. As the researchers suggest, the focus should move away from simply scaling up data and model size toward optimizing the entire training pipeline. “Our findings call for a renewed focus on model scaling that considers the entire training pipeline,” emphasizes the researchers. 

The authors emphasize the need for further research to explore the factors that determine when and how catastrophic overtraining occurs. However, a key takeaway from their study is that by adopting smarter strategies for AI development, less can sometimes be more. 

Related



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleC3.ai (NYSE:AI) Partners With PwC For AI Transformation In Banking And Industrial Sectors
Next Article Former Google CEO suggests building data centers in remote locations in case of nation-state attacks to slow down AI
Advanced AI Editor
  • Website

Related Posts

I sat in on an AI training session at KPMG. It was almost like being back at journalism school.

July 26, 2025

What to expect from Tesla CEO Elon Musk’s new Master Plan

July 26, 2025

Tesla Optimus robots will ship with a design no consumer has seen yet

July 26, 2025
Leave A Reply

Latest Posts

David Geffen Sued By Estranged Husband for Breach of Contract

Auction House Will Sell Egyptian Artifact Despite Concern From Experts

Anish Kapoor Lists New York Apartment for $17.75 M.

Street Fighter 6 Community Rocked by AI Art Controversy

Latest Posts

$750 Target Stays as Analysts Expect AI Gaps to Close

July 27, 2025

A.I. May Be the Future, but First It Has to Study Ancient Roman History

July 27, 2025

OpenAI CEO Sam Altman issues big warning for ChatGPT users: Here are all the details – Technology News

July 27, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • $750 Target Stays as Analysts Expect AI Gaps to Close
  • A.I. May Be the Future, but First It Has to Study Ancient Roman History
  • OpenAI CEO Sam Altman issues big warning for ChatGPT users: Here are all the details – Technology News
  • This Indian With IIT, MIT Degree Could Have Received Rs 800 Crore Joining Bonus Ast Meta! – Trak.in
  • Beijing Is Using Soft Power to Gain Global Dominance

Recent Comments

  1. Rejestracja on Online Education – How I Make My Videos
  2. Anonymous on AI, CEOs, and the Wild West of Streaming
  3. MichaelWinty on Local gov’t reps say they look forward to working with Thomas
  4. 4rabet mirror on Former Tesla AI czar Andrej Karpathy coins ‘vibe coding’: Here’s what it means
  5. Janine Bethel on OpenAI research reveals that simply teaching AI a little ‘misinformation’ can turn it into an entirely unethical ‘out-of-the-way AI’

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.