Paper page - Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

A diffusion-based framework generates aligned novel views of images and geometry using warping-and-inpainting with cross-modal attention distillation and proximity-based mesh conditioning, achieving high-fidelity synthesis and 3D completion.

We introduce a diffusion-based framework that performs aligned novel view
image and geometry generation via a warping-and-inpainting methodology. Unlike
prior methods that require dense posed images or pose-embedded generative
models limited to in-domain views, our method leverages off-the-shelf geometry
predictors to predict partial geometries viewed from reference images, and
formulates novel-view synthesis as an inpainting task for both image and
geometry. To ensure accurate alignment between generated images and geometry,
we propose cross-modal attention distillation, where attention maps from the
image diffusion branch are injected into a parallel geometry diffusion branch
during both training and inference. This multi-task approach achieves
synergistic effects, facilitating geometrically robust image synthesis as well
as well-defined geometry prediction. We further introduce proximity-based mesh
conditioning to integrate depth and normal cues, interpolating between point
cloud and filtering erroneously predicted geometry from influencing the
generation process. Empirically, our method achieves high-fidelity
extrapolative view synthesis on both image and geometry across a range of
unseen scenes, delivers competitive reconstruction quality under interpolation
settings, and produces geometrically aligned colored point clouds for
comprehensive 3D completion. Project page is available at
https://cvlab-kaist.github.io/MoAI.

Source link

What's Hot

Ethan Thornton of Mach Industries takes the AI stage at Disrupt 2025

Containerize legacy Spring Boot application using Amazon Q Developer CLI and MCP server

Light has two identities that are impossible to see at the same time

Paper page – Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

Paper page – Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper page – RecGPT Technical Report

Paper page – C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Artist Tyrrell Winston Sues New Orleans Pelicans Over Instagram Posts

Blum Staffers Speak On Closure, Spiegler Slams Art ‘Financialization’

Theatre Director and Artist Dies at 83

France to Accelerate Return of Looted Artworks—and More Art News

Ethan Thornton of Mach Industries takes the AI stage at Disrupt 2025

Containerize legacy Spring Boot application using Amazon Q Developer CLI and MCP server

Light has two identities that are impossible to see at the same time

What's Hot

Paper page – Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation

Related Posts

Subscribe to Updates