Weight Standardization (Paper Explained)

It’s common for neural networks to include data normalization such as BatchNorm or GroupNorm. This paper extends the normalization to also include the weights of the network. This surprisingly simple change leads to a boost in performance and – combined with GroupNorm – new state-of-the-art results.

Abstract:
In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small batch sizes are not enough for training networks with Batch Normalization (BN), while other normalization methods that do not rely on batch knowledge still have difficulty matching the performances of BN in large-batch training. Our WS ends this problem because when used with Group Normalization and trained with 1 image/GPU, WS is able to match or outperform the performances of BN trained with large batch sizes with only 2 more lines of code. In micro-batch training, WS significantly outperforms other normalization methods. WS achieves these superior results by standardizing the weights in the convolutional layers, which we show is able to smooth the loss landscape by reducing the Lipschitz constants of the loss and the gradients. The effectiveness of WS is verified on many tasks, including image classification, object detection, instance segmentation, video recognition, semantic segmentation, and point cloud recognition. The code is available here: this https URL.

Authors: Siyuan Qiao, Huiyu Wang, Chenxi Liu, Wei Shen, Alan Yuille

Links:
YouTube:
Twitter:
BitChute:
Minds:

source

What's Hot

C3 AI CEO Tom Siebel to resign for health reasons

Alibaba Launches Qwen3-Coder AI Model for Agentic Programming Excellence

China’s Underground Market for Nvidia AI Chip Repairs Surges Amid U.S. Export Ban

Weight Standardization (Paper Explained)

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Yannic Kilcher Live Stream

US Appeals Court Overturns $8.8 M. Trademark Judgement For Yuga Labs

Old Masters ‘Making a Comeback’ in London: Morning Links

Bill Proposed To Apply Anti-Money Laundering Regulations to Art Market

France’s Culture Minister to Go on Trial for Corruption

C3 AI CEO Tom Siebel to resign for health reasons

Alibaba Launches Qwen3-Coder AI Model for Agentic Programming Excellence

China’s Underground Market for Nvidia AI Chip Repairs Surges Amid U.S. Export Ban

What's Hot

Weight Standardization (Paper Explained)

Related Posts

Subscribe to Updates