Artificial Hippocampus Networks For Efficient Long-Context Modeling - Takara TLDR

Long-sequence modeling faces a fundamental trade-off between the efficiency
of compressive fixed-size memory in RNN-like models and the fidelity of
lossless growing memory in attention-based Transformers. Inspired by the
Multi-Store Model in cognitive science, we introduce a memory framework of
artificial neural networks. Our method maintains a sliding window of the
Transformer’s KV cache as lossless short-term memory, while a learnable module
termed Artificial Hippocampus Network (AHN) recurrently compresses
out-of-window information into a fixed-size compact long-term memory. To
validate this framework, we instantiate AHNs using modern RNN-like
architectures, including Mamba2, DeltaNet, and Gated DeltaNet. Extensive
experiments on long-context benchmarks LV-Eval and InfiniteBench demonstrate
that AHN-augmented models consistently outperform sliding window baselines and
achieve performance comparable or even superior to full-attention models, while
substantially reducing computational and memory requirements. For instance,
augmenting the Qwen2.5-3B-Instruct with AHNs reduces inference FLOPs by 40.5%
and memory cache by 74.0%, while improving its average score on LV-Eval (128k
sequence length) from 4.41 to 5.88. Code is available at:
https://github.com/ByteDance-Seed/AHN.

Source link

What's Hot

Ex-Cohere execs Sara Hooker and Sudip Roy unveil new AI startup

Opus 2 Buys Uncover In AI + Litigation Expansion Drive – Artificial Lawyer

Vibe Checker: Aligning Code Evaluation with Human Preference – Takara TLDR

Artificial Hippocampus Networks for Efficient Long-Context Modeling – Takara TLDR

Vibe Checker: Aligning Code Evaluation with Human Preference – Takara TLDR

Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization – Takara TLDR

LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation – Takara TLDR

$45 M. Basquait Painting to Headline Sotheby’s Fall Sales in New York

Guggenheim’s 2026 Shows Include Carol Bove Survey, Taryn Simon Project

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Ex-Cohere execs Sara Hooker and Sudip Roy unveil new AI startup

Opus 2 Buys Uncover In AI + Litigation Expansion Drive – Artificial Lawyer

Vibe Checker: Aligning Code Evaluation with Human Preference – Takara TLDR

What's Hot

Artificial Hippocampus Networks for Efficient Long-Context Modeling – Takara TLDR

Related Posts

Subscribe to Updates