Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Perplexity AI Compares XRP to Top Altcoins

Cohere, Ottawa sign non-binding agreement on government AI uses

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning – Takara TLDR

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

Hugging Face: 5 ways enterprises can slash AI costs without sacrificing performance 

By Advanced AI EditorAugust 19, 2025No Comments7 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Enterprises seem to accept it as a basic fact: AI models require a significant amount of compute; they simply have to find ways to obtain more of it. 

But it doesn’t have to be that way, according to Sasha Luccioni, AI and climate lead at Hugging Face. What if there’s a smarter way to use AI? What if, instead of striving for more (often unnecessary) compute and ways to power it, they can focus on improving model performance and accuracy? 

Ultimately, model makers and enterprises are focusing on the wrong issue: They should be computing smarter, not harder or doing more, Luccioni says. 

“There are smarter ways of doing things that we’re currently under-exploring, because we’re so blinded by: We need more FLOPS, we need more GPUs, we need more time,” she said. 

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage

Architecting efficient inference for real throughput gains

Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Here are five key learnings from Hugging Face that can help enterprises of all sizes use AI more efficiently. 

1: Right-size the model to the task 

Avoid defaulting to giant, general-purpose models for every use case. Task-specific or distilled models can match, or even surpass, larger models in terms of accuracy for targeted workloads — at a lower cost and with reduced energy consumption. 

Luccioni, in fact, has found in testing that a task-specific model uses 20 to 30 times less energy than a general-purpose one. “Because it’s a model that can do that one task, as opposed to any task that you throw at it, which is often the case with large language models,” she said. 

Distillation is key here; a full model could initially be trained from scratch and then refined for a specific task. DeepSeek R1, for instance, is “so huge that most organizations can’t afford to use it” because you need at least 8 GPUs, Luccioni noted. By contrast, distilled versions can be 10, 20 or even 30X smaller and run on a single GPU. 

In general, open-source models help with efficiency, she noted, as they don’t need to be trained from scratch. That’s compared to just a few years ago, when enterprises were wasting resources because they couldn’t find the model they needed; nowadays, they can start out with a base model and fine-tune and adapt it. 

“It provides incremental shared innovation, as opposed to siloed, everyone’s training their models on their datasets and essentially wasting compute in the process,” said Luccioni. 

It’s becoming clear that companies are quickly getting disillusioned with gen AI, as costs are not yet proportionate to the benefits. Generic use cases, such as writing emails or transcribing meeting notes, are genuinely helpful. However, task-specific models still require “a lot of work” because out-of-the-box models don’t cut it and are also more costly, said Luccioni.

This is the next frontier of added value. “A lot of companies do want a specific task done,” Luccioni noted. “They don’t want AGI, they want specific intelligence. And that’s the gap that needs to be bridged.” 

2. Make efficiency the default

Adopt “nudge theory” in system design, set conservative reasoning budgets, limit always-on generative features and require opt-in for high-cost compute modes.

In cognitive science, “nudge theory” is a behavioral change management approach designed to influence human behavior subtly. The “canonical example,” Luccioni noted, is adding cutlery to takeout: Having people decide whether they want plastic utensils, rather than automatically including them with every order, can significantly reduce waste.

“Just getting people to opt into something versus opting out of something is actually a very powerful mechanism for changing people’s behavior,” said Luccioni. 

Default mechanisms are also unnecessary, as they increase use and, therefore, costs because models are doing more work than they need to. For instance, with popular search engines such as Google, a gen AI summary automatically populates at the top by default. Luccioni also noted that, when she recently used OpenAI’s GPT-5, the model automatically worked in full reasoning mode on “very simple questions.”

“For me, it should be the exception,” she said. “Like, ‘what’s the meaning of life, then sure, I want a gen AI summary.’ But with ‘What’s the weather like in Montreal,’ or ‘What are the opening hours of my local pharmacy?’ I do not need a generative AI summary, yet it’s the default. I think that the default mode should be no reasoning.”

3. Optimize hardware utilization

Use batching; adjust precision and fine-tune batch sizes for specific hardware generation to minimize wasted memory and power draw. 

For instance, enterprises should ask themselves: Does the model need to be on all the time? Will people be pinging it in real time, 100 requests at once? In that case, always-on optimization is necessary, Luccioni noted. However, in many others, it’s not; the model can be run periodically to optimize memory usage, and batching can ensure optimal memory utilization. 

“It’s kind of like an engineering challenge, but a very specific one, so it’s hard to say, ‘Just distill all the models,’ or ‘change the precision on all the models,’” said Luccioni. 

In one of her recent studies, she found that batch size depends on hardware, even down to the specific type or version. Going from one batch size to plus-one can increase energy use because models need more memory bars. 

“This is something that people don’t really look at, they’re just like, ‘Oh, I’m gonna maximize the batch size,’ but it really comes down to tweaking all these different things, and all of a sudden it’s super efficient, but it only works in your specific context,” Luccioni explained. 

4. Incentivize energy transparency

It always helps when people are incentivized; to this end, Hugging Face earlier this year launched AI Energy Score. It’s a novel way to promote more energy efficiency, utilizing a 1- to 5-star rating system, with the most efficient models earning a “five-star” status. 

It could be considered the “Energy Star for AI,” and was inspired by the potentially-soon-to-be-defunct federal program, which set energy efficiency specifications and branded qualifying appliances with an Energy Star logo. 

“For a couple of decades, it was really a positive motivation, people wanted that star rating, right?,” said Luccioni. “Something similar with Energy Score would be great.”

Hugging Face has a leaderboard up now, which it plans to update with new models (DeepSeek, GPT-oss) in September, and continually do so every 6 months or sooner as new models become available. The goal is that model builders will consider the rating as a “badge of honor,” Luccioni said.

5. Rethink the “more compute is better” mindset

Instead of chasing the largest GPU clusters, begin with the question: “What is the smartest way to achieve the result?” For many workloads, smarter architectures and better-curated data outperform brute-force scaling.

“I think that people probably don’t need as many GPUs as they think they do,” said Luccioni. Instead of simply going for the biggest clusters, she urged enterprises to rethink the tasks GPUs will be completing and why they need them, how they performed those types of tasks before, and what adding extra GPUs will ultimately get them. 

“It’s kind of this race to the bottom where we need a bigger cluster,” she said. “It’s thinking about what you’re using AI for, what technique do you need, what does that require?” 

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleWhatsApp Tests AI Writing Help Tool for iOS Beta Users
Next Article Tesla Model Y L officially launched: price, features, and more
Advanced AI Editor
  • Website

Related Posts

DeepSeek V3.1 just dropped — and it might be the most powerful open AI yet

August 19, 2025

VB AI Impact Series: Can you really govern multi-agent AI?

August 19, 2025

Keychain launches AI operating system for CPG manufacturers

August 19, 2025

Comments are closed.

Latest Posts

Barbara Hepworth Sculpture Will Remain in UK After £3.8 M. Raised

After 12-Year Hiatus, Egypt’s Alexandria Biennale Will Return

Senator Seeks Investigation into Jeffrey Epstein’s Work for Leon Black

Spike Lee’s ‘Highest 2 Lowest’ Features Art From His Own Collection

Latest Posts

Perplexity AI Compares XRP to Top Altcoins

August 19, 2025

Cohere, Ottawa sign non-binding agreement on government AI uses

August 19, 2025

ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning – Takara TLDR

August 19, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Perplexity AI Compares XRP to Top Altcoins
  • Cohere, Ottawa sign non-binding agreement on government AI uses
  • ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning – Takara TLDR
  • How Infosys built a generative AI solution to process oil and gas drilling data with Amazon Bedrock
  • MIT Report Finds Most AI Business Investments Fail, Reveals ‘GenAI Divide’ — Virtualization Review

Recent Comments

  1. ScottLag on C3 AI and Arcfield Announce Partnership to Accelerate AI Capabilities to Serve U.S. Defense and Intelligence Communities
  2. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  3. Charliecep on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  4. nekretnine-zabljak-placevi-753 on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  5. Jimmyjaito on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.