Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

MIT Technology Could Slash Energy Use in Oil Refining by 90%

Data management innovation: JP Morgan’s Fusion & Snowflake transform investing

Nebius Stock (NBIS) Surges 20% Following a $1B AI Funding Round

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • Adobe Sensi
    • Aleph Alpha
    • Alibaba Cloud (Qwen)
    • Amazon AWS AI
    • Anthropic (Claude)
    • Apple Core ML
    • Baidu (ERNIE)
    • ByteDance Doubao
    • C3 AI
    • Cohere
    • DataRobot
    • DeepSeek
  • AI Research & Breakthroughs
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Education AI
    • Energy AI
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Media & Entertainment
    • Transportation AI
    • Manufacturing AI
    • Retail AI
    • Agriculture AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
Advanced AI News
Home » New Study Warns of Catastrophic Overtraining in Large Language Models
Industry Applications

New Study Warns of Catastrophic Overtraining in Large Language Models

Advanced AI BotBy Advanced AI BotApril 13, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Source: Shutterstock

The race to build ever-larger language models is being driven by the assumption that more pre-training data equals better performance. It’s no surprise that AI companies have been scrambling to find enough quality data to train their AI models, often resorting to creating synthetic data to build and fine-tune the AI models. But what if this core assumption is flawed? 

A new study warns that more pre-training data may not always lead to better AI models. Researchers from high-ranking universities including Carnegie Mellon University, Stanford University, Harvard University, and Princeton University highlight the phenomenon of “Catastrophic Overtraining.” Their recent research on this matter suggests that extending pre-training can actually degrade a model’s ability to be fine-tuned effectively, leading to poorer performance in real-world applications. 

The researchers challenge the “more is better” belief when it comes to training AI models. “Contrary to common belief, longer pre-training does not always lead to better post-trained models,” wrote the authors in their study published on arXiv. “We have shown that this is a consequence of a broader underlying phenomenon where models become more sensitive to perturbations as they are pre-trained on more tokens.”

Why do AI models require pre-training? AI companies use pre-training to teach AI systems foundational skills relevant to their tasks. This could be anything from understanding language, analyzing images, predicting sequences, or recognizing patterns in data. 

Pre-training plays an important role as it allows models to generalize knowledge, adapt to diverse contexts, and perform effectively across a wide range of tasks. Just to be clear, the researchers don’t reject pre-training but suggest developers need to be more strategic about how much pre-training is enough.

To understand how the pre-training would impact AI models, the researchers compared two versions of Ai2’s open-source OLMo-1B model. One was trained on 2.3 trillion tokens, while the other on 3 trillion trillion tokens. Surprisingly, the model trained on more data performed worse after fine-tuning. It showed 2-3% lower accuracy on standard benchmarks like ARC-Challenge, PIQA, and AlpacaEval. 

The authors explain this degradation in performance through what they call “progressive sensitivity”. As the models are trained for longer, their internal parameters become increasingly sensitive to changes such as tweaking the model during fine-tuning or adding more data. This heightened sensitivity means that even minor adjustments or even small amounts of noise in the data can seriously disrupt what the model has already learned. 

The study supports its findings through evidence from multiple angles. When the researchers added Gaussian noise to pre-trained models, they found performance became significantly worse with increasing pre-training tokens. Additionally, they validated their results using a different setup involving fine-tuned benchmarks, which yielded similar outcomes.

The researchers admit that their research is not universal as their research suggests that the risk of catastrophic overtraining is higher on smaller models. They also emphasize that overtraining can’t always be fixed, even with good techniques, if the tasks aren’t well-aligned.

Source: Shutterstock

“Catastrophic overtraining may be inevitable, even if the fine-tuning process is regularized, especially when the pre-training and fine-tuning tasks are misaligned,” shared the researchers. This highlights the importance of ensuring alignment between training and fine-tuning objectives.

AI model pre-training is a crucial component of the development process. However, the study’s findings highlight the risks of overtraining. So, what is the sweet spot? According to the researchers, it involves striking a balance between base model quality and post-training adaptability.

Developers may need to rethink the approach to building AI models. As the researchers suggest, the focus should move away from simply scaling up data and model size toward optimizing the entire training pipeline. “Our findings call for a renewed focus on model scaling that considers the entire training pipeline,” emphasizes the researchers. 

The authors emphasize the need for further research to explore the factors that determine when and how catastrophic overtraining occurs. However, a key takeaway from their study is that by adopting smarter strategies for AI development, less can sometimes be more. 

Related



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleC3.ai (NYSE:AI) Partners With PwC For AI Transformation In Banking And Industrial Sectors
Next Article Former Google CEO suggests building data centers in remote locations in case of nation-state attacks to slow down AI
Advanced AI Bot
  • Website

Related Posts

Morgan Stanley upgrades mining stock as best pick to play rare earths

June 7, 2025

‘Bitcoin Family’ changed security after recent crypto kidnappings

June 7, 2025

UK judge warns of risk to justice after lawyers cited fake AI-generated cases in court

June 7, 2025
Leave A Reply Cancel Reply

Latest Posts

Jiaxing Train Station By Architect Ma Yansong Is A Model Of People-Centric, Green Urban Design

Midwestern Grotto Tradition Celebrated In Sheboygan, WI

Hugh Jackman And Sonia Friedman Boldly Bid To Democratize Theater

Men’s Swimwear Gets Casual At Miami Swim Week 2025

Latest Posts

MIT Technology Could Slash Energy Use in Oil Refining by 90%

June 7, 2025

Data management innovation: JP Morgan’s Fusion & Snowflake transform investing

June 7, 2025

Nebius Stock (NBIS) Surges 20% Following a $1B AI Funding Round

June 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

YouTube LinkedIn
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.