Paper Page - Exploring The Evolution Of Physics Cognition In Video Generation: A Survey

Recent advancements in video generation have witnessed significant progress,
especially with the rapid advancement of diffusion models. Despite this, their
deficiencies in physical cognition have gradually received widespread attention
– generated content often violates the fundamental laws of physics, falling
into the dilemma of ”visual realism but physical absurdity”. Researchers began
to increasingly recognize the importance of physical fidelity in video
generation and attempted to integrate heuristic physical cognition such as
motion representations and physical knowledge into generative systems to
simulate real-world dynamic scenarios. Considering the lack of a systematic
overview in this field, this survey aims to provide a comprehensive summary of
architecture designs and their applications to fill this gap. Specifically, we
discuss and organize the evolutionary process of physical cognition in video
generation from a cognitive science perspective, while proposing a three-tier
taxonomy: 1) basic schema perception for generation, 2) passive cognition of
physical knowledge for generation, and 3) active cognition for world
simulation, encompassing state-of-the-art methods, classical paradigms, and
benchmarks. Subsequently, we emphasize the inherent key challenges in this
domain and delineate potential pathways for future research, contributing to
advancing the frontiers of discussion in both academia and industry. Through
structured review and interdisciplinary analysis, this survey aims to provide
directional guidance for developing interpretable, controllable, and physically
consistent video generation paradigms, thereby propelling generative models
from the stage of ”visual mimicry” towards a new phase of ”human-like
physical comprehension”.

Source link

What's Hot

Cognition AI defies turbulence with a $400M raise at $10.2B valuation

Build and scale adoption of AI agents for education with Strands Agents, Amazon Bedrock AgentCore, and LibreChat

ASML Puts $1.5B Into Mistral AI, Becomes Largest Shareholder

Paper page – Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Why Language Models Hallucinate – Takara TLDR

WildScore: Benchmarking MLLMs in-the-Wild Symbolic Music Reasoning – Takara TLDR

LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generation – Takara TLDR

New Banksy Work at London’s Royal Courts Immediately Covered Up

John Pritzker Donates 188 Dada and Surrealist Works to the Met Museum

British Museum Says Bayeux Tapestry Is Safe—and More Art News

Tony Shafrazi and the Art of the Comeback

Cognition AI defies turbulence with a $400M raise at $10.2B valuation

Build and scale adoption of AI agents for education with Strands Agents, Amazon Bedrock AgentCore, and LibreChat

ASML Puts $1.5B Into Mistral AI, Becomes Largest Shareholder

What's Hot

Paper page – Exploring the Evolution of Physics Cognition in Video Generation: A Survey

Related Posts

Subscribe to Updates