ROSE: Remove Objects With Side Effects In Videos - Takara TLDR

Video object removal has achieved advanced performance due to the recent
success of video generative models. However, when addressing the side effects
of objects, e.g., their shadows and reflections, existing works struggle to
eliminate these effects for the scarcity of paired video data as supervision.
This paper presents ROSE, termed Remove Objects with Side Effects, a framework
that systematically studies the object’s effects on environment, which can be
categorized into five common cases: shadows, reflections, light, translucency
and mirror. Given the challenges of curating paired videos exhibiting the
aforementioned effects, we leverage a 3D rendering engine for synthetic data
generation. We carefully construct a fully-automatic pipeline for data
preparation, which simulates a large-scale paired dataset with diverse scenes,
objects, shooting angles, and camera trajectories. ROSE is implemented as an
video inpainting model built on diffusion transformer. To localize all
object-correlated areas, the entire video is fed into the model for
reference-based erasing. Moreover, additional supervision is introduced to
explicitly predict the areas affected by side effects, which can be revealed
through the differential mask between the paired videos. To fully investigate
the model performance on various side effect removal, we presents a new
benchmark, dubbed ROSE-Bench, incorporating both common scenarios and the five
special side effects for comprehensive evaluation. Experimental results
demonstrate that ROSE achieves superior performance compared to existing video
object erasing models and generalizes well to real-world video scenarios. The
project page is https://rose2025-inpaint.github.io/.

Source link

What's Hot

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification – Takara TLDR

Summit With OpenAI, Google DeepMind Reaches Bleak Agreement

MIT to Give Bees a Break with Robot HAZMAT Pollinator

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification – Takara TLDR

FakeParts: a New Family of AI-Generated DeepFakes – Takara TLDR

Collaborative Multi-Modal Coding for High-Quality 3D Generation – Takara TLDR

Australian School Faces Pushback over AI Art Course—and More Art News

London Museum Secures Banksy’s Piranhas

Egyptian Antiquities Trafficker Sentenced to Six Months in Prison

Sotheby’s to Launch First Series of Luxury Auctions in Abu Dhabi

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification – Takara TLDR

Summit With OpenAI, Google DeepMind Reaches Bleak Agreement

MIT to Give Bees a Break with Robot HAZMAT Pollinator

What's Hot

ROSE: Remove Objects with Side Effects in Videos – Takara TLDR

Related Posts

Subscribe to Updates