Autoregressive Universal Video Segmentation Model - Takara TLDR

Recent video foundation models such as SAM2 excel at prompted video
segmentation by treating masks as a general-purpose primitive. However, many
real-world settings require unprompted segmentation that aims to detect and
track all objects in a video without external cues, leaving today’s landscape
fragmented across task-specific models and pipelines. We recast streaming video
segmentation as sequential mask prediction, analogous to language modeling, and
introduce the Autoregressive Universal Segmentation Model (AUSM), a single
architecture that unifies both prompted and unprompted video segmentation.
Built on recent state-space models, AUSM maintains a fixed-size spatial state
and scales to video streams of arbitrary length. Furthermore, all components of
AUSM are designed for parallel training across frames, yielding substantial
speedups over iterative training. On standard benchmarks (DAVIS17, YouTube-VOS
2018 & 2019, MOSE, YouTube-VIS 2019 & 2021, and OVIS) AUSM outperforms prior
universal streaming video segmentation methods and achieves up to 2.5x faster
training on 16-frame sequences.

Source link

What's Hot

Malaysia’s SkyeChip unveils the country’s first edge AI processor

Juro + Wordsmith Form MCP-Based AI Partnership – Artificial Lawyer

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis – Takara TLDR

Autoregressive Universal Video Segmentation Model – Takara TLDR

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis – Takara TLDR

VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space – Takara TLDR

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels – Takara TLDR

Artifacts From 2,000-Year-old Sunken City Lifted Out of the Sea

Fita Threatens Legal Action for Uni’s Trans-Inclusive Museum Guidance

Claire Oliver Gallery Expands in New York’s Harlem Neighborhood

Van Gogh Museum Threatens Dutch Government with Closure

Malaysia’s SkyeChip unveils the country’s first edge AI processor

Juro + Wordsmith Form MCP-Based AI Partnership – Artificial Lawyer

DeepScholar-Bench: A Live Benchmark and Automated Evaluation for Generative Research Synthesis – Takara TLDR

What's Hot

Autoregressive Universal Video Segmentation Model – Takara TLDR

Related Posts

Subscribe to Updates