Paper Page - InterActHuman: Multi-Concept Human Animation With Layout-Aligned Audio Conditions

A new framework enables precise, per-identity control of multiple concepts in end-to-end human animation by enforcing region-specific binding of multi-modal conditions.

End-to-end human animation with rich multi-modal conditions, e.g., text,
image and audio has achieved remarkable advancements in recent years. However,
most existing methods could only animate a single subject and inject conditions
in a global manner, ignoring scenarios that multiple concepts could appears in
the same video with rich human-human interactions and human-object
interactions. Such global assumption prevents precise and per-identity control
of multiple concepts including humans and objects, therefore hinders
applications. In this work, we discard the single-entity assumption and
introduce a novel framework that enforces strong, region-specific binding of
conditions from modalities to each identity’s spatiotemporal footprint. Given
reference images of multiple concepts, our method could automatically infer
layout information by leveraging a mask predictor to match appearance cues
between the denoised video and each reference appearance. Furthermore, we
inject local audio condition into its corresponding region to ensure
layout-aligned modality matching in a iterative manner. This design enables the
high-quality generation of controllable multi-concept human-centric videos.
Empirical results and ablation studies validate the effectiveness of our
explicit layout control for multi-modal conditions compared to implicit
counterparts and other existing methods.

Source link

What's Hot

OpenAI Ramps Up Robotics Work in Race Toward AGI

Etsy Adds AI-Powered Writing and Search Tools for Sellers

Applicants say AI makes the job market hell. OpenAI wants to help.

Paper page – InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

Research Paper – Takara TLDR

2D Gaussian Splatting with Semantic Alignment for Image Inpainting – Takara TLDR

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward – Takara TLDR

David Lynch’s Los Angeles Home and Studio on Sale for $15 M.

Picasso Inspires Name of Newly Discovered Microsnail

Rare Hieroglyphic Decree Identified in Egypt

Taylor Swift’s Ex-Neighbor Sentenced For Selling Fake Picassos

OpenAI Ramps Up Robotics Work in Race Toward AGI

Etsy Adds AI-Powered Writing and Search Tools for Sellers

Applicants say AI makes the job market hell. OpenAI wants to help.

What's Hot

Paper page – InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

Related Posts

Subscribe to Updates