Paper Page - FantasyTalking: Realistic Talking Portrait Generation Via Coherent Motion Synthesis

Creating a realistic animatable avatar from a single static portrait remains challenging. Existing approaches often struggle to capture subtle facial expressions, the associated global body movements, and the dynamic background. To address these limitations, we propose a novel framework that leverages a pretrained video diffusion transformer model to generate high-fidelity, coherent talking portraits with controllable motion dynamics. At the core of our work is a dual-stage audio-visual alignment strategy. In the first stage, we employ a clip-level training scheme to establish coherent global motion by aligning audio-driven dynamics across the entire scene, including the reference portrait, contextual objects, and background. In the second stage, we refine lip movements at the frame level using a lip-tracing mask, ensuring precise synchronization with audio signals. To preserve identity without compromising motion flexibility, we replace the commonly used reference network with a facial-focused cross-attention module that effectively maintains facial consistency throughout the video. Furthermore, we integrate a motion intensity modulation module that explicitly controls expression and body motion intensity, enabling controllable manipulation of portrait movements beyond mere lip motion. Extensive experimental results show that our proposed approach achieves higher quality with better realism, coherence, motion intensity, and identity preservation. Ours project page: this https URL.

Source link

What's Hot

OpenAI and Oracle strike $300B cloud computing deal to power AI

Which Tech Stock Deserves a Spot in Your Portfolio Now?

Anthropic reports outages, Claude and Console impacted

Paper page – FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning – Takara TLDR

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search – Takara TLDR

Visual Representation Alignment for Multimodal Large Language Models – Takara TLDR

Ralph Rugoff to Leave London’s Hayward Gallery After 20 Years

New York Foundation for the Arts Workers Move to Unionize

Patrizia Sandretto Re Rebaudengo Teams Up with New Museum

Growing Support for Parthenon Marbles’ Return to Greece, More Art News

OpenAI and Oracle strike $300B cloud computing deal to power AI

Which Tech Stock Deserves a Spot in Your Portfolio Now?

Anthropic reports outages, Claude and Console impacted

What's Hot

Paper page – FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

Related Posts

Subscribe to Updates