Nvidia Scales AI Beyond The Data Center

The annual HotChips conference starts this Sunday, Aug. 24, in San Francisco. Nvidia is scheduled to present six sessions covering topics of interest to AI data center users and operators and will make several key announcements I’ll cover in this article. (Like most AI semiconductor-related companies, Nvidia is a client of Cambrian-AI Research.)

NVLink Fusion is perhaps the most fascinating topic, enabling the entire industry of CPUs and GPUs to create chips to access NVLink, the company’s secret sauce for interconnecting up to 72 accelerators and 36 CPUs in a rack. While I’m working on another article that specifically covers how Qualcomm is using NVLink Fusion to enter the data center with its super-fast Arm-based Oryon CPUs, I’ll focus here on how Nvidia is enabling AI to expand beyond a single data center, and a new 4-bit format that could significantly improve the efficiency of training AI models by as much as four-fold.

Nvidia will present six technical sessions at this year’s HotChips conference in San Francisco.

Nvidia

Connecting Multiple Data Data Centers for Massive AI

As older data centers struggle to grow AI due to power constraints, many are seeking a method to break through the walls and distances to connect their network of data centers, delivering on the promise of AI and growing their business. Nvidia has launched a new Ethernet card called Spectrum-XGS to enable these data centers to enter the world of giga-scale AI. This scale is needed for training large AI models but increasingly is also used for agentic AI and reasoning models. Nvidia claims this network can nearly double the performance of multi-site AI workloads.

Nvidia has introduced new Ethernet to support multi-data-center integration.

Nvidia

NVFP4: 4-Bit AI Training as Accurate as 16-Bit?

Nvidia is somewhat unique in the industry in having a large in-house research organization under Bill Dally, the company’s chief scientist and senior vice president. Dr. Dally’s team has developed many of the breakthroughs that have kept Nvidia in the lead and caused its competitors to rush to catch up with its multi-year head start.

Last year at HotChips ’24, Dr. Dally said that he thought there was more gold to mine in the realm of “quantization”; the ever-shrinking data formats that double or even quadruple the performance efficiency by using smaller and smaller data formats. While we may be nearing the end of that road, the new 4-bit floating point NVDP4 is pretty remarkable way to finish the story. NVDP4 will be available on all Blackwell and future Nvidia GPUs.

Nvidia has developed a new 4-Bit format for AI training that the company claims is as accurate as the 16-bit format used in nearly all AI training, enabling a four-fold increase in efficiency.

Nvidia

In another of Nvidia’s research results, the company discussed the use of speculative decoding, where the GPU creates drafts of the next token and then uses AI (duh!) to verify that that draft token is valid or not. Speculative execution has been used for decades in CPUs, and now is increasing being considered for deploying more efficient AI. Note that Cerebras has disputed the representation of their numbers on the graph below.

Speculative decoding creates new draft models for potential next tokens.

Nvidia

Nvidia Keeps its Research Dial Turned Up to 11

I hope you can attend the many fine sessions being offered next week at HotChips. I will, at least on-line! This is the hottest conference every year for the geekiest of the industry, those presenting and in attendance. It is this sort of sharing of ideas and research results that feeds our industry and enables the USA’s leadership in semiconductors.

The Nvidia roadmap through 2028

Nvidia

Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor companies as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Esperanto, Flex, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm, Graphcore, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article. For more information, please visit our website at https://cambrian-AI.com.

Source link

What's Hot

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

DeepSeek AI Tips Remittix As The Best Crypto To Buy Now

Nvidia Scales AI Beyond The Data Center

Few Investors See It Coming: Nvidia’s Next Growth Engine Is Already in Motion

Hyperscale Data to Mine Bitcoin, Expand AI Data Center in Michigan

Competition heats up to challenge Nvidia’s AI chip dominance

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation – Takara TLDR

Implement a secure MLOps platform based on Terraform and GitHub

DeepSeek AI Tips Remittix As The Best Crypto To Buy Now

What's Hot

Nvidia Scales AI Beyond The Data Center

Connecting Multiple Data Data Centers for Massive AI

NVFP4: 4-Bit AI Training as Accurate as 16-Bit?

Nvidia Keeps its Research Dial Turned Up to 11

Related Posts

Subscribe to Updates