How Confident Are Video Models? Empowering Video Models To Express Their Uncertainty - Takara TLDR

Generative video models demonstrate impressive text-to-video capabilities,
spurring widespread adoption in many real-world applications. However, like
large language models (LLMs), video generation models tend to hallucinate,
producing plausible videos even when they are factually wrong. Although
uncertainty quantification (UQ) of LLMs has been extensively studied in prior
work, no UQ method for video models exists, raising critical safety concerns.
To our knowledge, this paper represents the first work towards quantifying the
uncertainty of video models. We present a framework for uncertainty
quantification of generative video models, consisting of: (i) a metric for
evaluating the calibration of video models based on robust rank correlation
estimation with no stringent modeling assumptions; (ii) a black-box UQ method
for video models (termed S-QUBED), which leverages latent modeling to
rigorously decompose predictive uncertainty into its aleatoric and epistemic
components; and (iii) a UQ dataset to facilitate benchmarking calibration in
video models. By conditioning the generation task in the latent space, we
disentangle uncertainty arising due to vague task specifications from that
arising from lack of knowledge. Through extensive experiments on benchmark
video datasets, we demonstrate that S-QUBED computes calibrated total
uncertainty estimates that are negatively correlated with the task accuracy and
effectively computes the aleatoric and epistemic constituents.

Source link

What's Hot

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration – Takara TLDR

Few Investors See It Coming: Nvidia’s Next Growth Engine Is Already in Motion

OpenAI, Jony Ive AI hardware faces reported delays

How Confident are Video Models? Empowering Video Models to Express their Uncertainty – Takara TLDR

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration – Takara TLDR

SurveyBench: How Well Can LLM(-Agents) Write Academic Surveys? – Takara TLDR

SpineBench: A Clinically Salient, Level-Aware Benchmark Powered by the SpineMed-450k Corpus – Takara TLDR

Morning Links for October 6, 2025

Sotheby’s to Sell René Magritte Held in Same Collection for 100 years

Former ARTnews Publisher Dies at 97

National Gallery of Art Closes as a Result of Government Shutdown

REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration – Takara TLDR

Few Investors See It Coming: Nvidia’s Next Growth Engine Is Already in Motion

OpenAI, Jony Ive AI hardware faces reported delays

What's Hot

How Confident are Video Models? Empowering Video Models to Express their Uncertainty – Takara TLDR

Related Posts

Subscribe to Updates