Paper Page - Geometry Forcing: Marrying Video Diffusion And 3D Representation For Consistent World Modeling

Videos inherently represent 2D projections of a dynamic 3D world. However,
our analysis suggests that video diffusion models trained solely on raw video
data often fail to capture meaningful geometric-aware structure in their
learned representations. To bridge this gap between video diffusion models and
the underlying 3D nature of the physical world, we propose Geometry Forcing, a
simple yet effective method that encourages video diffusion models to
internalize latent 3D representations. Our key insight is to guide the model’s
intermediate representations toward geometry-aware structure by aligning them
with features from a pretrained geometric foundation model. To this end, we
introduce two complementary alignment objectives: Angular Alignment, which
enforces directional consistency via cosine similarity, and Scale Alignment,
which preserves scale-related information by regressing unnormalized geometric
features from normalized diffusion representation. We evaluate Geometry Forcing
on both camera view-conditioned and action-conditioned video generation tasks.
Experimental results demonstrate that our method substantially improves visual
quality and 3D consistency over the baseline methods. Project page:
https://GeometryForcing.github.io.

Source link

What's Hot

Google Photos upgrades its image-to-video feature with Veo 3

Free Mark Cuban Foundation AI Bootcamp Coming to Richmond This Fall

C3.AI Stock Is Tumbling Thursday: What’s Going On? – C3.ai (NYSE:AI)

Paper page – Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Robix: A Unified Model for Robot Interaction, Reasoning and Planning – Takara TLDR

Open Data Synthesis For Deep Research – Takara TLDR

Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR – Takara TLDR

Kadist to Close San Francisco Art Space After 14 Years

Nazi-Looted Painting from Argentine Home May Have Been Recovered

Moche Residence Unearthed at Archaeological Site in Northern Peru

Kim Sajet to Helm the Milwaukee Art Museum

Google Photos upgrades its image-to-video feature with Veo 3

Free Mark Cuban Foundation AI Bootcamp Coming to Richmond This Fall

C3.AI Stock Is Tumbling Thursday: What’s Going On? – C3.ai (NYSE:AI)

What's Hot

Paper page – Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

Related Posts

Subscribe to Updates