Training More Effective Learned Optimizers, And Using Them To Train Themselves (Paper Explained)

#ai #research #optimization

Optimization is still the domain of hand-crafted, simple algorithms. An ML engineer not only has to pick a suitable one for their problem but also often do grid-search over various hyper-parameters. This paper proposes to learn a single, unified optimization algorithm, given not by an equation, but by an LSTM-based neural network, to act as an optimizer for any deep learning problem, and ultimately to optimize itself.

OUTLINE:
0:00 – Intro & Outline
2:20 – From Hand-Crafted to Learned Features
4:25 – Current Optimization Algorithm
9:40 – Learned Optimization
15:50 – Optimizer Architecture
22:50 – Optimizing the Optimizer using Evolution Strategies
30:30 – Task Dataset
34:00 – Main Results
36:50 – Implicit Regularization in the Learned Optimizer
41:05 – Generalization across Tasks
41:40 – Scaling Up
45:30 – The Learned Optimizer Trains Itself
47:20 – Pseudocode
49:45 – Broader Impact Statement
52:55 – Conclusion & Comments

Paper:

Abstract:
Much as replacing hand-designed features with learned functions has revolutionized how we solve perceptual tasks, we believe learned algorithms will transform how we train models. In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters. We introduce a new, neural network parameterized, hierarchical optimizer with access to additional features such as validation loss to enable automatic regularization. Most learned optimizers have been trained on only a single task, or a small number of tasks. We train our optimizers on thousands of tasks, making use of orders of magnitude more compute, resulting in optimizers that generalize better to unseen tasks. The learned optimizers not only perform well, but learn behaviors that are distinct from existing first order optimizers. For instance, they generate update steps that have implicit regularization and adapt as the problem hyperparameters (e.g. batch size) or architecture (e.g. neural network width) change. Finally, these learned optimizers show evidence of being useful for out of distribution tasks such as training themselves from scratch.

Authors: Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

Links:
YouTube:
Twitter:
Discord:
BitChute:
Minds:
Parler:
LinkedIn:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

source

What's Hot

AI could have ‘human-level’ intelligence in next few years, Google DeepMind CEO says

Google’s Gemma 3 270M is a compact yet powerful AI model that can run on your toaster

Tesla upgrades EV voice assistant system with AI from DeepSeek and ByteDance

Training more effective learned optimizers, and using them to train themselves (Paper Explained)

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Mütter Museum in Philadelphia Announces New Policy for Human Remains

Inigo Philbrick, Art Dealer Convicted of Fraud, Appears in BBC Film

Links for August 22, 2025

White House Targets Specific Artworks at Smithsonian Museums

AI could have ‘human-level’ intelligence in next few years, Google DeepMind CEO says

Google’s Gemma 3 270M is a compact yet powerful AI model that can run on your toaster

Tesla upgrades EV voice assistant system with AI from DeepSeek and ByteDance

What's Hot

Training more effective learned optimizers, and using them to train themselves (Paper Explained)

Related Posts

Subscribe to Updates