Paper Page - VAU-R1: Advancing Video Anomaly Understanding Via Reinforcement Fine-Tuning

VAU-R1 uses Multimodal Large Language Models with Reinforcement Fine-Tuning to enhance video anomaly reasoning, complemented by VAU-Bench, a Chain-of-Thought benchmark for evaluating anomaly understanding.

Video Anomaly Understanding (VAU) is essential for applications such as smart
cities, security surveillance, and disaster alert systems, yet remains
challenging due to its demand for fine-grained spatio-temporal perception and
robust reasoning under ambiguity. Despite advances in anomaly detection,
existing methods often lack interpretability and struggle to capture the causal
and contextual aspects of abnormal events. This limitation is further
compounded by the absence of comprehensive benchmarks for evaluating reasoning
ability in anomaly scenarios. To address both challenges, we introduce VAU-R1,
a data-efficient framework built upon Multimodal Large Language Models (MLLMs),
which enhances anomaly reasoning through Reinforcement Fine-Tuning (RFT).
Besides, we propose VAU-Bench, the first Chain-of-Thought benchmark tailored
for video anomaly reasoning, featuring multiple-choice QA, detailed rationales,
temporal annotations, and descriptive captions. Empirical results show that
VAU-R1 significantly improves question answering accuracy, temporal grounding,
and reasoning coherence across diverse contexts. Together, our method and
benchmark establish a strong foundation for interpretable and reasoning-aware
video anomaly understanding. Our code is available at
https://github.com/GVCLab/VAU-R1.

Source link

What's Hot

Indian techie who once worked at IBM Bengaluru left software engineering because…

Create a private workforce on Amazon SageMaker Ground Truth with the AWS CDK

OpenAI Just Created One of the Richest Charities in the World

Paper page – VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

A Survey of Reinforcement Learning for Large Reasoning Models – Takara TLDR

RewardDance: Reward Scaling in Visual Generation – Takara TLDR

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning – Takara TLDR

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight

NeueHouse, a Hot Spot for Art Events, Files for Bankruptcy

Obama Presidential Center Announces Nine New Artist Commissions

Italy Protests Return of Carpaccio Altarpiece to Slovenia

Indian techie who once worked at IBM Bengaluru left software engineering because…

Create a private workforce on Amazon SageMaker Ground Truth with the AWS CDK

OpenAI Just Created One of the Richest Charities in the World

What's Hot

Paper page – VAU-R1: Advancing Video Anomaly Understanding via Reinforcement Fine-Tuning

Related Posts

Subscribe to Updates