Revisiting Glorot Initialization For Long-Range Linear Recurrences

arXiv:2505.19827v1 Announce Type: cross
Abstract: Proper initialization is critical for Recurrent Neural Networks (RNNs), particularly in long-range reasoning tasks, where repeated application of the same weight matrix can cause vanishing or exploding signals. A common baseline for linear recurrences is Glorot initialization, designed to ensure stable signal propagation–but derived under the infinite-width, fixed-length regime–an unrealistic setting for RNNs processing long sequences. In this work, we show that Glorot initialization is in fact unstable: small positive deviations in the spectral radius are amplified through time and cause the hidden state to explode. Our theoretical analysis demonstrates that sequences of length $t = O(\sqrt{n})$, where $n$ is the hidden width, are sufficient to induce instability. To address this, we propose a simple, dimension-aware rescaling of Glorot that shifts the spectral radius slightly below one, preventing rapid signal explosion or decay. These results suggest that standard initialization schemes may break down in the long-sequence regime, motivating a separate line of theory for stable recurrent initialization.

Source link

What's Hot

IBM vs. QCOM: Which Tech Stock Deserves a Spot in Your Portfolio Now? – September 9, 2025

Interleaving Reasoning for Better Text-to-Image Generation – Takara TLDR

Powering innovation at scale: How AWS is tackling AI infrastructure challenges

Revisiting Glorot Initialization for Long-Range Linear Recurrences

LTLCrit: A Temporal Logic-based LLM Critic for Safe and Efficient Embodied Agents

From Imitation to Innovation: The Emergence of AI Unique Artistic Styles and the Challenge of Copyright Protection

VerifyLLM: LLM-Based Pre-Execution Task Plan Verification for Robots

Anne Imhof Reimagines Football Jerseys with Nike

Storied Collector and MoMA Trustee Dies at 92

Congress Obtains Drawing Trump Apparently Made for Jeffrey Epstein

Galerie Gmurzynska Slated to Open in New York’s Fuller Building

IBM vs. QCOM: Which Tech Stock Deserves a Spot in Your Portfolio Now? – September 9, 2025

Interleaving Reasoning for Better Text-to-Image Generation – Takara TLDR

Powering innovation at scale: How AWS is tackling AI infrastructure challenges

What's Hot

Revisiting Glorot Initialization for Long-Range Linear Recurrences

Related Posts

Subscribe to Updates