Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Recruiting Talent During Hiring Pause

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models – Takara TLDR

Partnership on AI Welcomes 10 New Partners

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
VentureBeat AI

'Western Qwen': IBM wows with Granite 4 LLM launch and hybrid Mamba/Transformer architecture

By Advanced AI EditorOctober 7, 2025No Comments10 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost requirements.

Despite being one of the oldest active tech companies in the U.S. (founded in 1911, 114 years ago!), "Big Blue" as its often nicknamed has already wowed many AI industry workers and followers with this new Granite 4.0 family of LLMs, as they offer high performance on third-party benchmarks; a permissive, business friendly license (Apache 2.0) that allows developers and enterprises to freely take, modify and deploy the models for their own commercial purposes; and, perhaps most importantly, have symbolically put the U.S. back into a competitive place with the growing raft of high-performing new generation open source Chinese LLMs, especially from Alibaba's prolific Qwen team — alongside OpenAI with its gpt-oss model family released earlier this summer.

Meta, the parent company of Facebook and Instagram, was once seen as the world and U.S. leader of open source LLMs with its Llama models, but after the disappointing release of the Llama 4 family in April and lack of its planned, most powerful Llama 4 Behemoth, it has since pursued a different strategy and is now partnering with outside labs like Midjourney on AI products, while it continues to build out an expensive, in-house AI "Superintelligence" team, as well.

Little wonder AI engineer Alexander Doria (aka Pierre-Carl Langlais) observed, with a hilarious Lethal Weapon meme, that "ibm suiting up again after llama 4 fumbled," and "we finally have western qwen."

Hybrid (Transformer/Mamba) theory

At the heart of IBM's Granite 4.0 release is a new hybrid design that combines two very different architectures, or underlying organizational structures, for the LLMs in question: transformers and Mamba.

Transformers, introduced in 2017 by Vaswani and colleagues in the famous Google paper “Attention Is All You Need,” power most large language models in use today.

In this design, every token — essentially a small chunk of text, like a word or part of a word — can compare itself to every other token in the input. This “all-to-all” comparison is what gives transformers their strong ability to capture context and meaning across a passage.

The trade-off is efficiency: because the model must calculate relationships between every possible pair of tokens in the context window, computation and memory demands grow rapidly as the input gets longer. This quadratic scaling makes transformers costly to run on very long documents or at high volume.

Mamba, by contrast, is a newer architecture developed in late 2023 by researchers Albert Gu and Tri Dao at Carnegie Mellon University and Princeton University. Instead of comparing every token against all the others at once, it processes tokens one at a time, updating its internal state as it moves through the sequence. This design scales only linearly with input length, making it far more efficient at handling long documents or multiple requests at once. The trade-off is that transformers still tend to perform better in certain kinds of reasoning and “few-shot” learning, where it helps to hold many detailed token-to-token comparisons in memory.

This enables much greater efficiency, especially for long documents or multi-session inference, though transformers retain advantages for some reasoning and few-shot learning tasks.

But whether the model is built on transformers, Mamba, or a hybrid of the two, the way it generates new words works the same way. At each step, the model doesn’t just pick from what’s already in the context window. Instead, it uses its internal weights — built from training on trillions of text samples — to predict the most likely next token from its entire vocabulary. That’s why, when prompted with “The capital of France is…,” the model can output “Paris” even if “Paris” isn’t in the input text. It has learned from countless training examples that “Paris” is a highly probable continuation in that context. In other words, the context window guides the prediction, but the embedding space — the model’s learned representation of all tokens it knows — supplies the actual words it can generate.

By combining Mamba-2 layers with transformer blocks, Granite 4.0 seeks to offer the best of both worlds: the efficiency of Mamba and the contextual precision of transformers.

This is the first official Granite release to adopt the hybrid approach. IBM previewed it earlier in 2025 with the Granite-4.0-Tiny-Preview, but Granite 4.0 marks the company’s first full family of models built on the Mamba-transformer combination.

Granite 4.0 is being positioned as an enterprise-ready alternative to conventional transformer-based models, with particular emphasis on agentic AI tasks such as instruction following, function calling, and retrieval-augmented generation (RAG). The models are open sourced under the Apache 2.0 license, cryptographically signed for authenticity, and stand out as the first open language model family certified under ISO 42001, an international standard for AI governance and transparency.

Reducing memory needs, expanding accessibility

One of Granite 4.0’s defining features is its ability to significantly reduce GPU memory consumption compared to traditional large language models.

IBM reports that the hybrid Mamba-transformer design can cut RAM requirements by more than 70% (!!!) in production environments, especially for workloads involving long contexts and multiple concurrent sessions.

Benchmarks released alongside the launch illustrate these improvements.

Granite-4.0-H-Small, a 32B-parameter mixture-of-experts model with 9B active parameters, maintains strong throughput on a single NVIDIA H100 GPU, continuing to accelerate even under workloads that typically strain transformer-only systems.

This efficiency translates directly into lower hardware costs for enterprises running intensive inference tasks.

For smaller-scale or edge deployments, Granite 4.0 offers two lighter options: Granite-4.0-H-Tiny, a 7B-parameter hybrid with 1B active parameters, and Granite-4.0-H-Micro, a 3B dense hybrid. IBM is also releasing Granite-4.0-Micro, a 3B transformer-only model intended for platforms not yet optimized for Mamba-based architectures.

Performance benchmarks

Performance metrics suggest that the new models not only reduce costs but also compete with larger systems on enterprise-critical tasks.

According to Stanford HELM’s IFEval benchmark, which measures how well LLMs follow instructions from users, Granite-4.0-H-Small surpasses nearly all open weight models in instruction-following accuracy, ranking just behind Meta’s much larger Llama 4 Maverick.

The models also show strong results on the Berkeley Function Calling Leaderboard v3, where Granite-4.0-H-Small achieves a favorable trade-off between accuracy and hosted API pricing. On retrieval-augmented generation tasks, Granite 4.0 models post some of the highest mean accuracy scores among open competitors.

Notably, IBM highlights that even Granite 4.0’s smallest models outperform Granite 3.3 8B, despite being less than half its size, underscoring the gains achieved through both architectural changes and refined training methods.

Trust, safety, and security

Alongside technical efficiency, IBM is emphasizing governance and trust. Granite is the first open model family to achieve ISO/IEC 42001:2023 certification, demonstrating compliance with international standards for AI accountability, data privacy, and explainability.

The company has also partnered with HackerOne to run a bug bounty program for Granite, offering up to $100,000 for vulnerabilities that could expose security flaws or adversarial risks. Additionally, every Granite 4.0 model checkpoint is cryptographically signed, enabling developers to verify provenance and integrity before deployment.

IBM provides indemnification for customers using Granite on its watsonx.ai platform, covering third-party intellectual property claims against AI-generated content.

Training and roadmap

Granite 4.0 models were trained on a 22-trillion-token corpus sourced from enterprise-relevant datasets including DataComp-LM, Wikipedia, and curated subsets designed to support language, code, math, multilingual tasks, and cybersecurity.

Post-training is split between instruction-tuned models, released today, and reasoning-focused “Thinking” variants, which are expected later this fall.

IBM plans to expand the family by the end of 2025 with additional models, including Granite 4.0 Medium for heavier enterprise workloads and Granite 4.0 Nano for edge deployments.

Broad availability across platforms

Granite 4.0 models are available immediately on Hugging Face, IBM watsonx.ai, with distribution also through partners such as Dell Technologies, Docker Hub, Kaggle, LM Studio, NVIDIA NIM, Ollama, OPAQUE, and Replicate.

Support through Amazon SageMaker JumpStart and Microsoft Azure AI Foundry is expected soon.

The hybrid architecture is supported in major inference frameworks, including vLLM 0.10.2 and Hugging Face Transformers.

Compatibility has also been extended to llama.cpp and MLX, although optimization work is ongoing. The models are also usable in Unsloth for fine-tuning and in Continue for custom AI coding assistants.

Enterprise focus

Early access testing by enterprise partners, including EY and Lockheed Martin, has guided the launch.

IBM highlights that the models are tailored for real-world enterprise needs, such as supporting multi-agent workflows, customer support automation, and large-scale retrieval systems.

Granite 4.0 models are available in both Base and Instruct forms, with Instruct variants optimized for enterprise instruction-following tasks. The upcoming “Thinking” series will target advanced reasoning.

Alternate hybrid Mamba / Transformer models

Besides IBM, several major efforts are already charting different designs for mixing Transformers with Mamba architecture:

Model

Hybrid strategy / architecture

Highlights

AI21 Jamba

Interleaves Transformer blocks and Mamba layers, with Mixture-of-Experts (MoE) in some layers

Supports context lengths up to 256K tokens and offers higher throughput and lower memory usage than pure Transformers while maintaining competitive benchmarks

Nvidia Nemotron-H

Replaces most attention layers with Mamba-2 blocks, retaining a few attention layers where needed

Demonstrates up to 3× faster inference throughput compared to pure-Transformer peers while keeping benchmark accuracy comparable

Nemotron-Nano-2

A reasoning-optimized hybrid built on Nemotron’s design

Reports up to 6× throughput improvement on reasoning tasks while matching or surpassing accuracy

Domain-specific variants

Hybridized architectures in multimodal models, such as swapping in Mamba layers for decoder components

Shows that the hybrid approach extends beyond text into vision-language applications

The Qwen family from Alibaba remains a dense, decoder-only Transformer architecture, with no Mamba or SSM layers in its mainline models. However, experimental offshoots like Vamba-Qwen2-VL-7B show that hybrids derived from Qwen are possible, especially in vision-language settings. For now, though, Qwen itself is not part of the hybrid wave.

What Granite 4.0 means for enterprises and what's next

Granite 4.0 reflects IBM’s strategy of combining open access with enterprise-grade safety, scalability, and efficiency. By focusing on lowering inference costs and reinforcing trust with governance standards, IBM positions the Granite family as a practical foundation for enterprises building AI applications at scale.

For the U.S., the release carries symbolic weight: with Meta stepping back from leading the open-weight frontier after the uneven reception of Llama 4, and with Alibaba’s Qwen family rapidly advancing in China, IBM’s move positions American enterprise once again as a competitive force in globally available models.

By making Granite 4.0 Apache-licensed, cryptographically signed, and ISO 42001-certified, IBM is signaling both openness and responsibility at a moment when trust, efficiency, and affordability are top of mind. This is especially enticing to U.S. and Western-based organizations who may be interested in open source models, but wary of those originating from China — rightly or not — over possible political ramifications and U.S. government contracts.

For practitioners inside organizations, this positioning is not abstract. Lead AI engineers tasked with managing the full lifecycle of LLMs will see Granite 4.0’s smaller memory footprint as a way to deploy faster and scale with leaner teams.

Senior AI engineers in orchestration roles, who must balance budget limits with the need for efficiency, can take advantage of Granite’s compatibility with mainstream platforms like SageMaker and Hugging Face to streamline pipelines without locking into proprietary ecosystems.

Senior data engineers, responsible for integrating AI with complex data systems, will note the hybrid models’ efficiency on long-context inputs, enabling retrieval-augmented generation on large datasets at lower cost.

And for IT security directors charged with managing day-to-day defense, IBM’s bug bounty program, cryptographic signing, and ISO accreditation provide clear governance signals that align with enterprise compliance needs.

By targeting these distinct roles with a model family that is efficient, open, and hardened for enterprise use, IBM is not only courting adoption but also shaping a uniquely American answer to the open-source challenge posed by Qwen and other Chinese entrants. In doing so, Granite 4.0 places IBM at the center of a new phase in the global LLM race — one defined not just by size and speed, but by trust, cost efficiency, and readiness for real-world deployment.

With additional models scheduled for release before the end of the year and broader availability across major AI development platforms, Granite 4.0 is set to play a central role in IBM’s vision of enterprise-ready, open-source AI.



Source link

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleOpenAI and the race for AI-driven commerce
Next Article Dual AI engines: LLMs and optimizers sweep September mega-round funding
Advanced AI Editor
  • Website

Related Posts

Salesforce launches AI 'trust layer' to tackle enterprise deployment failures plaguing 80% of projects

October 7, 2025

IBM claims 45% productivity gains with Project Bob, its multi-model IDE that orchestrates LLMs with full repository context

October 7, 2025

Google's Jules coding agent moves beyond chat with new command line and API

October 7, 2025

Comments are closed.

Latest Posts

Basquiat Work on Paper Headline’s Phillips’ Frieze Week Sales

Charges Against Isaac Wright ‘to Be Dropped’ After His Arrest by NYPD

Tomb of Amenhotep III Reopens After Two-Decade Renovation    

Limited Edition Print of Ozzy Osbourne Art Sold To Benefit Charities

Latest Posts

Recruiting Talent During Hiring Pause

October 7, 2025

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models – Takara TLDR

October 7, 2025

Partnership on AI Welcomes 10 New Partners

October 7, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Recruiting Talent During Hiring Pause
  • Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models – Takara TLDR
  • Partnership on AI Welcomes 10 New Partners
  • IBM and Anthropic kick off Claude AI pact with IDE for developers
  • IBM and Anthropic join forces for AI business customers

Recent Comments

  1. Mikedaniel6Nalay on Here’s how Apple’s new local AI models perform against Google’s
  2. Lorinda Maness on VAST Data Powers Smarter, Evolving AI Agents with NVIDIA Data Flywheel
  3. Mikedaniel6Nalay on Using AI saves teachers ‘six weeks per year,’ Gallup poll finds – but at what cost?
  4. Mikedaniel6Nalay on Google DeepMind’s Demis Hassabis Wants to Build AI Email Assistant That Can Reply in Your Style: Report
  5. Mikedaniel6Nalay on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.