Paper Page - SViMo: Synchronized Diffusion For Video And Motion Generation In Hand-object Interaction Scenarios

A framework combining visual priors and dynamic constraints within a synchronized diffusion process generates HOI video and motion simultaneously, enhancing video-motion consistency and generalization.

Hand-Object Interaction (HOI) generation has significant application
potential. However, current 3D HOI motion generation approaches heavily rely on
predefined 3D object models and lab-captured motion data, limiting
generalization capabilities. Meanwhile, HOI video generation methods prioritize
pixel-level visual fidelity, often sacrificing physical plausibility.
Recognizing that visual appearance and motion patterns share fundamental
physical laws in the real world, we propose a novel framework that combines
visual priors and dynamic constraints within a synchronized diffusion process
to generate the HOI video and motion simultaneously. To integrate the
heterogeneous semantics, appearance, and motion features, our method implements
tri-modal adaptive modulation for feature aligning, coupled with 3D
full-attention for modeling inter- and intra-modal dependencies. Furthermore,
we introduce a vision-aware 3D interaction diffusion model that generates
explicit 3D interaction sequences directly from the synchronized diffusion
outputs, then feeds them back to establish a closed-loop feedback cycle. This
architecture eliminates dependencies on predefined object models or explicit
pose guidance while significantly enhancing video-motion consistency.
Experimental results demonstrate our method’s superiority over state-of-the-art
approaches in generating high-fidelity, dynamically plausible HOI sequences,
with notable generalization capabilities in unseen real-world scenarios.
Project page at https://github.com/Droliven/SViMo\_project.

Source link

What's Hot

OpenAI secures Microsoft’s blessing to transition its for-profit arm

OpenAI and Microsoft sign preliminary deal to revise partnership terms

A Survey of Reinforcement Learning for Large Reasoning Models – Takara TLDR

Paper page – SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

A Survey of Reinforcement Learning for Large Reasoning Models – Takara TLDR

RewardDance: Reward Scaling in Visual Generation – Takara TLDR

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning – Takara TLDR

Sally Mann Says Her Black Men Photos Are ‘Problematic’ in Hindsight

NeueHouse, a Hot Spot for Art Events, Files for Bankruptcy

Obama Presidential Center Announces Nine New Artist Commissions

National Gallery and Tate Have ‘Bad Blood’—and More Art News

OpenAI secures Microsoft’s blessing to transition its for-profit arm

OpenAI and Microsoft sign preliminary deal to revise partnership terms

A Survey of Reinforcement Learning for Large Reasoning Models – Takara TLDR

What's Hot

Paper page – SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios

Related Posts

Subscribe to Updates