Bill Dally, Nvidia chief scientist
Nvidia
The annual HotChips conference starts this Sunday, Aug. 24, in San Francisco. Nvidia is scheduled to present six sessions covering topics of interest to AI data center users and operators and will make several key announcements I’ll cover in this article. (Like most AI semiconductor-related companies, Nvidia is a client of Cambrian-AI Research.)
NVLink Fusion is perhaps the most fascinating topic, enabling the entire industry of CPUs and GPUs to create chips to access NVLink, the company’s secret sauce for interconnecting up to 72 accelerators and 36 CPUs in a rack. While I’m working on another article that specifically covers how Qualcomm is using NVLink Fusion to enter the data center with its super-fast Arm-based Oryon CPUs, I’ll focus here on how Nvidia is enabling AI to expand beyond a single data center, and a new 4-bit format that could significantly improve the efficiency of training AI models by as much as four-fold.
Nvidia will present six technical sessions at this year’s HotChips conference in San Francisco.
Nvidia
Connecting Multiple Data Data Centers for Massive AI
As older data centers struggle to grow AI due to power constraints, many are seeking a method to break through the walls and distances to connect their network of data centers, delivering on the promise of AI and growing their business. Nvidia has launched a new Ethernet card called Spectrum-XGS to enable these data centers to enter the world of giga-scale AI. This scale is needed for training large AI models but increasingly is also used for agentic AI and reasoning models. Nvidia claims this network can nearly double the performance of multi-site AI workloads.
Nvidia has introduced new Ethernet to support multi-data-center integration.
Nvidia
NVFP4: 4-Bit AI Training as Accurate as 16-Bit?
Nvidia is somewhat unique in the industry in having a large in-house research organization under Bill Dally, the company’s chief scientist and senior vice president. Dr. Dally’s team has developed many of the breakthroughs that have kept Nvidia in the lead and caused its competitors to rush to catch up with its multi-year head start.
Last year at HotChips ’24, Dr. Dally said that he thought there was more gold to mine in the realm of “quantization”; the ever-shrinking data formats that double or even quadruple the performance efficiency by using smaller and smaller data formats. While we may be nearing the end of that road, the new 4-bit floating point NVDP4 is pretty remarkable way to finish the story. NVDP4 will be available on all Blackwell and future Nvidia GPUs.
Nvidia has developed a new 4-Bit format for AI training that the company claims is as accurate as the 16-bit format used in nearly all AI training, enabling a four-fold increase in efficiency.
Nvidia
In another of Nvidia’s research results, the company discussed the use of speculative decoding, where the GPU creates drafts of the next token and then uses AI (duh!) to verify that that draft token is valid or not. Speculative execution has been used for decades in CPUs, and now is increasing being considered for deploying more efficient AI. Note that Cerebras has disputed the representation of their numbers on the graph below.
Speculative decoding creates new draft models for potential next tokens.
Nvidia
Nvidia Keeps its Research Dial Turned Up to 11
I hope you can attend the many fine sessions being offered next week at HotChips. I will, at least on-line! This is the hottest conference every year for the geekiest of the industry, those presenting and in attendance. It is this sort of sharing of ideas and research results that feeds our industry and enables the USA’s leadership in semiconductors.
The Nvidia roadmap through 2028
Nvidia
Disclosures: This article expresses the opinions of the author and is not to be taken as advice to purchase from or invest in the companies mentioned. My firm, Cambrian-AI Research, is fortunate to have many semiconductor companies as our clients, including Baya Systems BrainChip, Cadence, Cerebras Systems, D-Matrix, Esperanto, Flex, Groq, IBM, Intel, Micron, NVIDIA, Qualcomm, Graphcore, SImA.ai, Synopsys, Tenstorrent, Ventana Microsystems, and scores of investors. I have no investment positions in any of the companies mentioned in this article. For more information, please visit our website at https://cambrian-AI.com.