Paper page - Token Bottleneck: One Token to Remember Dynamics

ToBo is a self-supervised learning method that creates compact, temporally aware visual representations for sequential scene understanding tasks, outperforming baselines in both simulated and real-world environments.

Deriving compact and temporally aware visual representations from dynamic
scenes is essential for successful execution of sequential scene understanding
tasks such as visual tracking and robotic manipulation. In this paper, we
introduce Token Bottleneck (ToBo), a simple yet intuitive self-supervised
learning pipeline that squeezes a scene into a bottleneck token and predicts
the subsequent scene using minimal patches as hints. The ToBo pipeline
facilitates the learning of sequential scene representations by conservatively
encoding the reference scene into a compact bottleneck token during the squeeze
step. In the expansion step, we guide the model to capture temporal dynamics by
predicting the target scene using the bottleneck token along with few target
patches as hints. This design encourages the vision backbone to embed temporal
dependencies, thereby enabling understanding of dynamic transitions across
scenes. Extensive experiments in diverse sequential tasks, including video
label propagation and robot manipulation in simulated environments demonstrate
the superiority of ToBo over baselines. Moreover, deploying our pre-trained
model on physical robots confirms its robustness and effectiveness in
real-world environments. We further validate the scalability of ToBo across
different model scales.

Source link

What's Hot

TU Wien Rendering #7 – Ray-Sphere Intersection

Coinbase and Perplexity AI Unite for Live Crypto Price Access

Paper page – Beyond the Linear Separability Ceiling

Paper page – Token Bottleneck: One Token to Remember Dynamics

Paper page – Beyond the Linear Separability Ceiling

Paper page – Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate

Paper page – PyVision: Agentic Vision with Dynamic Tooling

Homeland Security Targets Chicago’s National Museum of Puerto Rican Arts & Culture

1,600-Year-Old Tomb of Mayan City’s Founding King Discovered in Belize

Centre Pompidou Cancels Caribbean Art Show, Raising Controversy

‘Night at the Museum’ Reboot in the Works

TU Wien Rendering #7 – Ray-Sphere Intersection

Coinbase and Perplexity AI Unite for Live Crypto Price Access

Paper page – Beyond the Linear Separability Ceiling

What's Hot

Paper page – Token Bottleneck: One Token to Remember Dynamics

Related Posts

Subscribe to Updates