Movement Pruning: Adaptive Sparsity By Fine-Tuning (Paper Explained)

Deep neural networks are large models and pruning has become an important part of ML product pipelines, making models small while keeping their performance high. However, the classic pruning method, Magnitude Pruning, is suboptimal in models that are obtained by transfer learning. This paper proposes a solution, called Movement Pruning and shows its superior performance.

OUTLINE:
0:00 – Intro & High-Level Overview
0:55 – Magnitude Pruning
4:25 – Transfer Learning
7:25 – The Problem with Magnitude Pruning in Transfer Learning
9:20 – Movement Pruning
22:20 – Experiments
24:20 – Improvements via Distillation
26:40 – Analysis of the Learned Weights

Paper:
Code:

Abstract:
Magnitude pruning is a widely used strategy for reducing model size in pure supervised learning; however, it is less effective in the transfer learning regime that has become standard for state-of-the-art natural language processing applications. We propose the use of movement pruning, a simple, deterministic first-order weight pruning method that is more adaptive to pretrained model fine-tuning. We give mathematical foundations to the method and compare it to existing zeroth- and first-order pruning methods. Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes. When combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters.

Authors: Victor Sanh, Thomas Wolf, Alexander M. Rush

Links:
YouTube:
Twitter:
BitChute:
Minds:

source

What's Hot

Alloy is bringing data management to the robotics industry

Tech Brief (Sept. 24): Mercedes-Benz, ByteDance Partner on In-Car AI

Perplexity AI Browser Now Available in India- How is it different from Google Chrome?

Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

Alloy is bringing data management to the robotics industry

Tech Brief (Sept. 24): Mercedes-Benz, ByteDance Partner on In-Car AI

Perplexity AI Browser Now Available in India- How is it different from Google Chrome?

What's Hot

Movement Pruning: Adaptive Sparsity by Fine-Tuning (Paper Explained)

Related Posts

Subscribe to Updates