VideoFrom3D: 3D Scene Video Generation Via Complementary Image And Video Diffusion Models - Takara TLDR

In this paper, we propose VideoFrom3D, a novel framework for synthesizing
high-quality 3D scene videos from coarse geometry, a camera trajectory, and a
reference image. Our approach streamlines the 3D graphic design workflow,
enabling flexible design exploration and rapid production of deliverables. A
straightforward approach to synthesizing a video from coarse geometry might
condition a video diffusion model on geometric structure. However, existing
video diffusion models struggle to generate high-fidelity results for complex
scenes due to the difficulty of jointly modeling visual quality, motion, and
temporal consistency. To address this, we propose a generative framework that
leverages the complementary strengths of image and video diffusion models.
Specifically, our framework consists of a Sparse Anchor-view Generation (SAG)
and a Geometry-guided Generative Inbetweening (GGI) module. The SAG module
generates high-quality, cross-view consistent anchor views using an image
diffusion model, aided by Sparse Appearance-guided Sampling. Building on these
anchor views, GGI module faithfully interpolates intermediate frames using a
video diffusion model, enhanced by flow-based camera control and structural
guidance. Notably, both modules operate without any paired dataset of 3D scene
models and natural images, which is extremely difficult to obtain.
Comprehensive experiments show that our method produces high-quality,
style-consistent scene videos under diverse and challenging scenarios,
outperforming simple and extended baselines.

Source link

What's Hot

Meta Allows US Allies to Access Llama AI Models for Military Training

Will OpenAI Really Build 60 Football Fields Worth of AI Infrastructure Per Week?

Legal Tech Investment Hits All-Time High With Filevine Funding

VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models – Takara TLDR

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

Qwen3-Omni Technical Report – Takara TLDR

ContextFlow: Training-Free Video Object Editing via Adaptive Context Enrichment – Takara TLDR

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

Meta Allows US Allies to Access Llama AI Models for Military Training

Will OpenAI Really Build 60 Football Fields Worth of AI Infrastructure Per Week?

Legal Tech Investment Hits All-Time High With Filevine Funding

What's Hot

VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models – Takara TLDR

Related Posts

Subscribe to Updates