VGGT-X: When VGGT Meets Dense Novel View Synthesis - Takara TLDR

We study the problem of applying 3D Foundation Models (3DFMs) to dense Novel
View Synthesis (NVS). Despite significant progress in Novel View Synthesis
powered by NeRF and 3DGS, current approaches remain reliant on accurate 3D
attributes (e.g., camera poses and point clouds) acquired from
Structure-from-Motion (SfM), which is often slow and fragile in low-texture or
low-overlap captures. Recent 3DFMs showcase orders of magnitude speedup over
the traditional pipeline and great potential for online NVS. But most of the
validation and conclusions are confined to sparse-view settings. Our study
reveals that naively scaling 3DFMs to dense views encounters two fundamental
barriers: dramatically increasing VRAM burden and imperfect outputs that
degrade initialization-sensitive 3D training. To address these barriers, we
introduce VGGT-X, incorporating a memory-efficient VGGT implementation that
scales to 1,000+ images, an adaptive global alignment for VGGT output
enhancement, and robust 3DGS training practices. Extensive experiments show
that these measures substantially close the fidelity gap with
COLMAP-initialized pipelines, achieving state-of-the-art results in dense
COLMAP-free NVS and pose estimation. Additionally, we analyze the causes of
remaining gaps with COLMAP-initialized rendering, providing insights for the
future development of 3D foundation models and dense NVS. Our project page is
available at https://dekuliutesla.github.io/vggt-x.github.io/

Source link

What's Hot

China’s Zhipu AI predicts full artificial superintelligence still decades away

SimpleDocs and Law Insider Merge Together – Artificial Lawyer

PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images – Takara TLDR

VGGT-X: When VGGT Meets Dense Novel View Synthesis – Takara TLDR

PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images – Takara TLDR

Visual Jigsaw Post-Training Improves MLLMs – Takara TLDR

Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation – Takara TLDR

Federal Judge Denies Motion to Dismiss by Kasseem ‘Swizz Beatz’ Dean in 1MBD Scandal Case

Picasso Museum in Paris Plans $59 M. Expansion with New Sculpture Park

Giverny Landscape by Monet Among Top Lots at Bonhams October Sale

You Can Now Borrow Solange’s Art Books from Her Library

China’s Zhipu AI predicts full artificial superintelligence still decades away

SimpleDocs and Law Insider Merge Together – Artificial Lawyer

PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images – Takara TLDR

What's Hot

VGGT-X: When VGGT Meets Dense Novel View Synthesis – Takara TLDR

Related Posts

Subscribe to Updates