Close Menu
  • Home
  • AI Models
    • DeepSeek
    • xAI
    • OpenAI
    • Meta AI Llama
    • Google DeepMind
    • Amazon AWS AI
    • Microsoft AI
    • Anthropic (Claude)
    • NVIDIA AI
    • IBM WatsonX Granite 3.1
    • Adobe Sensi
    • Hugging Face
    • Alibaba Cloud (Qwen)
    • Baidu (ERNIE)
    • C3 AI
    • DataRobot
    • Mistral AI
    • Moonshot AI (Kimi)
    • Google Gemma
    • xAI
    • Stability AI
    • H20.ai
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Microsoft Research
    • Meta AI Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding & Startups
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • Expert Insights & Videos
    • Google DeepMind
    • Lex Fridman
    • Matt Wolfe AI
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • Matt Wolfe AI
    • The TechLead
    • Andrew Ng
    • OpenAI
  • Expert Blogs
    • François Chollet
    • Gary Marcus
    • IBM
    • Jack Clark
    • Jeremy Howard
    • Melanie Mitchell
    • Andrew Ng
    • Andrej Karpathy
    • Sebastian Ruder
    • Rachel Thomas
    • IBM
  • AI Policy & Ethics
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
    • EFF AI
    • European Commission AI
    • Partnership on AI
    • Stanford HAI Policy
    • Mozilla Foundation AI
    • Future of Life Institute
    • Center for AI Safety
    • World Economic Forum AI
  • AI Tools & Product Releases
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
    • Image Generation
    • Video Generation
    • Writing Tools
    • AI for Recruitment
    • Voice/Audio Generation
  • Industry Applications
    • Finance AI
    • Healthcare AI
    • Legal AI
    • Manufacturing AI
    • Media & Entertainment
    • Transportation AI
    • Education AI
    • Retail AI
    • Agriculture AI
    • Energy AI
  • AI Art & Entertainment
    • AI Art News Blog
    • Artvy Blog » AI Art Blog
    • Weird Wonderful AI Art Blog
    • The Chainsaw » AI Art
    • Artvy Blog » AI Art Blog
What's Hot

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs – Takara TLDR

Reflection AI lands $2B at $8B valuation to expand frontier AI infrastructure and safety research

Here's what's slowing down your AI strategy — and how to fix it

Facebook X (Twitter) Instagram
Advanced AI News
  • Home
  • AI Models
    • OpenAI (GPT-4 / GPT-4o)
    • Anthropic (Claude 3)
    • Google DeepMind (Gemini)
    • Meta (LLaMA)
    • Cohere (Command R)
    • Amazon (Titan)
    • IBM (Watsonx)
    • Inflection AI (Pi)
  • AI Research
    • Allen Institue for AI
    • arXiv AI
    • Berkeley AI Research
    • CMU AI
    • Google Research
    • Meta AI Research
    • Microsoft Research
    • OpenAI Research
    • Stanford HAI
    • MIT CSAIL
    • Harvard AI
  • AI Funding
    • AI Funding Database
    • CBInsights AI
    • Crunchbase AI
    • Data Robot Blog
    • TechCrunch AI
    • VentureBeat AI
    • The Information AI
    • Sifted AI
    • WIRED AI
    • Fortune AI
    • PitchBook
    • TechRepublic
    • SiliconANGLE – Big Data
    • MIT News
    • Data Robot Blog
  • AI Experts
    • Google DeepMind
    • Lex Fridman
    • Meta AI Llama
    • Yannic Kilcher
    • Two Minute Papers
    • AI Explained
    • TheAIEdge
    • The TechLead
    • Matt Wolfe AI
    • Andrew Ng
    • OpenAI
    • Expert Blogs
      • François Chollet
      • Gary Marcus
      • IBM
      • Jack Clark
      • Jeremy Howard
      • Melanie Mitchell
      • Andrew Ng
      • Andrej Karpathy
      • Sebastian Ruder
      • Rachel Thomas
      • IBM
  • AI Tools
    • AI Assistants
    • AI for Recruitment
    • AI Search
    • Coding Assistants
    • Customer Service AI
  • AI Policy
    • ACLU AI
    • AI Now Institute
    • Center for AI Safety
  • Business AI
    • Advanced AI News Features
    • Finance AI
    • Healthcare AI
    • Education AI
    • Energy AI
    • Legal AI
LinkedIn Instagram YouTube Threads X (Twitter)
Advanced AI News
Yannic Kilcher

Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)

By Advanced AI EditorMay 5, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest Copy Link Telegram LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email



#fastweights #deeplearning #transformers

Transformers are dominating Deep Learning, but their quadratic memory and compute requirements make them expensive to train and hard to use. Many papers have attempted to linearize the core module: the attention mechanism, using kernels – for example, the Performer. However, such methods are either not satisfactory or have other downsides, such as a reliance on random features. This paper establishes an intrinsic connection between linearized (kernel) attention and the much older Fast Weight Memory Systems, in part popularized by Jürgen Schmidhuber in the 90s. It shows the fundamental limitations of these algorithms and suggests new update rules and new kernels in order to fix these problems. The resulting model compares favorably to Performers on key synthetic experiments and real-world tasks.

OUTLINE:
0:00 – Intro & Overview
1:40 – Fast Weight Systems
7:00 – Distributed Storage of Symbolic Values
12:30 – Autoregressive Attention Mechanisms
18:50 – Connecting Fast Weights to Attention Mechanism
22:00 – Softmax as a Kernel Method (Performer)
25:45 – Linear Attention as Fast Weights
27:50 – Capacity Limitations of Linear Attention
29:45 – Synthetic Data Experimental Setup
31:50 – Improving the Update Rule
37:30 – Deterministic Parameter-Free Projection (DPFP) Kernel
46:15 – Experimental Results
50:50 – Conclusion & Comments

Paper:
Code:
Machine Learning Street Talk on Kernels:

Abstract:
We show the formal equivalence of linearised self-attention mechanisms and fast weight memories from the early ’90s. From this observation we infer a memory capacity limitation of recent linearised softmax attention variants. With finite memory, a desirable behaviour of fast weight memory models is to manipulate the contents of memory and dynamically interact with it. Inspired by previous work on fast weights, we propose to replace the update rule with an alternative rule yielding such behaviour. We also propose a new kernel function to linearise attention, balancing simplicity and effectiveness. We conduct experiments on synthetic retrieval problems as well as standard machine translation and language modelling tasks which demonstrate the benefits of our methods.

Authors: Imanol Schlag, Kazuki Irie, Jürgen Schmidhuber

Links:
TabNine Code Completion (Referral):
YouTube:
Twitter:
Discord:
BitChute:
Minds:
Parler:
LinkedIn:
BiliBili:

If you want to support me, the best thing to do is to share out the content 🙂

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar:
Patreon:
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

source

Follow on Google News Follow on Flipboard
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email Copy Link
Previous ArticleDeepMind Takes A Step Towards General AI! 🤖
Next Article Fireside Wisdom: Clarence Wooten at Spelman
Advanced AI Editor
  • Website

Related Posts

[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)

October 11, 2025

AGI is not coming!

August 9, 2025

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

July 23, 2025
Leave A Reply

Latest Posts

Smithsonian Closes Museums Amid Government Shutdown

The Rubin Names 2025 Art Prize, Research and Art Projects Grants

Kochi-Muziris Biennial Announces 66 Artists for December Exhibition

Instagram Launches ‘Rings’ Awards for Creators—With KAWS as a Judge

Latest Posts

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs – Takara TLDR

October 12, 2025

Reflection AI lands $2B at $8B valuation to expand frontier AI infrastructure and safety research

October 12, 2025

Here's what's slowing down your AI strategy — and how to fix it

October 12, 2025

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

Recent Posts

  • Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs – Takara TLDR
  • Reflection AI lands $2B at $8B valuation to expand frontier AI infrastructure and safety research
  • Here's what's slowing down your AI strategy — and how to fix it
  • The Grand AGI Delusion
  • Tencent unveils new AI model ‘Hunyuan T1’ that rivals DeepSeek R1 in performance and price

Recent Comments

  1. Juniorfar on 1-800-CHAT-GPT—12 Days of OpenAI: Day 10
  2. Owen Yashinski on Harvard, MIT Use Law To Limit Legal Payouts While Sitting On Endowments Worth Billions – IJR
  3. Sammie Zupan on Here’s how Apple’s new local AI models perform against Google’s
  4. Adah Paling on Harvard, MIT Use Law To Limit Legal Payouts While Sitting On Endowments Worth Billions – IJR
  5. ChaosReelV7Nalay on Bitcoin Security: Here’s What Makes The OG Blockchain Safer Than Fort Knox

Welcome to Advanced AI News—your ultimate destination for the latest advancements, insights, and breakthroughs in artificial intelligence.

At Advanced AI News, we are passionate about keeping you informed on the cutting edge of AI technology, from groundbreaking research to emerging startups, expert insights, and real-world applications. Our mission is to deliver high-quality, up-to-date, and insightful content that empowers AI enthusiasts, professionals, and businesses to stay ahead in this fast-evolving field.

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

LinkedIn Instagram YouTube Threads X (Twitter)
  • Home
  • About Us
  • Advertise With Us
  • Contact Us
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2025 advancedainews. Designed by advancedainews.

Type above and press Enter to search. Press Esc to cancel.