Linformer: Self-Attention With Linear Complexity (Paper Explained)

Transformers are notoriously resource-intensive because their self-attention mechanism requires a squared number of memory and computations in the length of the input sequence. The Linformer Model gets around that by using the fact that often, the actual information in the attention matrix is of lower rank and can be approximated.

OUTLINE:
0:00 – Intro & Overview
1:40 – The Complexity of Self-Attention
4:50 – Embedding Dimension & Multiple Heads
8:45 – Formal Attention
10:30 – Empirical Investigation into RoBERTa
20:00 – Theorem: Self-Attention is Low Rank
28:10 – Linear Self-Attention Method
36:15 – Theorem: Linear Self-Attention
44:10 – Language Modeling
46:40 – NLP Benchmarks
47:50 – Compute Time & Memory Gains
48:20 – Broader Impact Statement
49:55 – Conclusion

Paper:

Abstract:
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses O(n2) time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from O(n2) to O(n) in both time and space. The resulting linear transformer, the textit{Linformer}, performs on par with standard Transformer models, while being much more memory- and time-efficient.

Authors: Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma

Links:
YouTube:
Twitter:
Discord:
BitChute:
Minds:

source

What's Hot

Perplexity CEO Says Curiosity, Not Hype, Will Shape AI’s Future

MIT scientists uncover shocking origin of the moon’s magnetic scars

Why is an Amazon-backed AI startup making Orson Welles fan fiction?

Linformer: Self-Attention with Linear Complexity (Paper Explained)

AGI is not coming!

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)

Tony Shafrazi and the Art of the Comeback

Basquiats Linked to 1MDB Scandal Auctioned by US Government

US Ambassador to UK Fills Residence with Impressionist Masters

New Code of Ethics Implores UK Museums to End Fossil Fuel Sponsorships

Perplexity CEO Says Curiosity, Not Hype, Will Shape AI’s Future

MIT scientists uncover shocking origin of the moon’s magnetic scars

Why is an Amazon-backed AI startup making Orson Welles fan fiction?

What's Hot

Linformer: Self-Attention with Linear Complexity (Paper Explained)

Related Posts

Subscribe to Updates