Quantized Visual Geometry Grounded Transformer - Takara TLDR

Learning-based 3D reconstruction models, represented by Visual Geometry
Grounded Transformers (VGGTs), have made remarkable progress with the use of
large-scale transformers. Their prohibitive computational and memory costs
severely hinder real-world deployment. Post-Training Quantization (PTQ) has
become a common practice for compressing and accelerating models. However, we
empirically observe that PTQ faces unique obstacles when compressing
billion-scale VGGTs: the data-independent special tokens induce heavy-tailed
activation distributions, while the multi-view nature of 3D data makes
calibration sample selection highly unstable. This paper proposes the first
Quantization framework for VGGTs, namely QuantVGGT. This mainly relies on two
technical contributions: First, we introduce Dual-Smoothed Fine-Grained
Quantization, which integrates pre-global Hadamard rotation and post-local
channel smoothing to mitigate heavy-tailed distributions and inter-channel
variance robustly. Second, we design Noise-Filtered Diverse Sampling, which
filters outliers via deep-layer statistics and constructs frame-aware diverse
calibration clusters to ensure stable quantization ranges. Comprehensive
experiments demonstrate that QuantVGGT achieves the state-of-the-art results
across different benchmarks and bit-width, surpassing the previous
state-of-the-art generic quantization method with a great margin. We highlight
that our 4-bit QuantVGGT can deliver a 3.7$\times$ memory reduction and
2.5$\times$ acceleration in real-hardware inference, while maintaining
reconstruction accuracy above 98\% of its full-precision counterpart. This
demonstrates the vast advantages and practicality of QuantVGGT in
resource-constrained scenarios. Our code is released in
https://github.com/wlfeng0509/QuantVGGT.

Source link

What's Hot

CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling – Takara TLDR

Quantized Visual Geometry Grounded Transformer – Takara TLDR

ImagineArt Create Cinematic Human AI Video with WAN 2.5

Quantized Visual Geometry Grounded Transformer – Takara TLDR

CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling – Takara TLDR

TrustJudge: Inconsistencies of LLM-as-a-Judge and How to Alleviate Them – Takara TLDR

Mixture of Thoughts: Learning to Aggregate What Experts Think, Not Just What They Say – Takara TLDR

Lisa Phillips, Longtime Director of New York’s New Museum, to Retire

Submerged Port Discovery Offers Clues to Lost Tomb of Cleopatra

Forged Polish Painting Returns to the National Museum in Poznań

French Artist Invader Sues Julien Auctions Over Sale of Street Artworks

CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling – Takara TLDR