Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context - Takara TLDR

A key component of in-context reasoning is the ability of language models
(LMs) to bind entities for later retrieval. For example, an LM might represent
“Ann loves pie” by binding “Ann” to “pie”, allowing it to later retrieve “Ann”
when asked “Who loves pie?” Prior research on short lists of bound entities
found strong evidence that LMs implement such retrieval via a positional
mechanism, where “Ann” is retrieved based on its position in context. In this
work, we find that this mechanism generalizes poorly to more complex settings;
as the number of bound entities in context increases, the positional mechanism
becomes noisy and unreliable in middle positions. To compensate for this, we
find that LMs supplement the positional mechanism with a lexical mechanism
(retrieving “Ann” using its bound counterpart “pie”) and a reflexive mechanism
(retrieving “Ann” through a direct pointer). Through extensive experiments on
nine models and ten binding tasks, we uncover a consistent pattern in how LMs
mix these mechanisms to drive model behavior. We leverage these insights to
develop a causal model combining all three mechanisms that estimates next token
distributions with 95% agreement. Finally, we show that our model generalizes
to substantially longer inputs of open-ended text interleaved with entity
groups, further demonstrating the robustness of our findings in more natural
settings. Overall, our study establishes a more complete picture of how LMs
bind and retrieve entities in-context.

Source link

What's Hot

Deforming Videos to Masks: Flow Matching for Referring Video Segmentation – Takara TLDR

Google DeepMind’s Gemini Agent : Autonomous Al Coding Agent

OpenAI’s Next Bet: Intel Stock?

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context – Takara TLDR

Deforming Videos to Masks: Flow Matching for Referring Video Segmentation – Takara TLDR

TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning – Takara TLDR

EgoNight: Towards Egocentric Vision Understanding at Night with a Challenging Benchmark – Takara TLDR

Matthiesen Gallery Files Lawsuit Over Gustave Courbet Painting

MoMA Partners with Mattel for Van Gogh Barbie, Monet and Dalí Figures

Underground Film Legend and Artist Dies at 92

Artwork Forfeited by Inigo Philbrick’s Partner Flops at Sotheby’s

Deforming Videos to Masks: Flow Matching for Referring Video Segmentation – Takara TLDR

Google DeepMind’s Gemini Agent : Autonomous Al Coding Agent

OpenAI’s Next Bet: Intel Stock?

What's Hot

Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context – Takara TLDR

Related Posts

Subscribe to Updates