Paper Page - RealisDance-DiT: Simple Yet Strong Baseline Towards Controllable Character Animation In The Wild

Controllable character animation remains a challenging problem, particularly
in handling rare poses, stylized characters, character-object interactions,
complex illumination, and dynamic scenes. To tackle these issues, prior work
has largely focused on injecting pose and appearance guidance via elaborate
bypass networks, but often struggles to generalize to open-world scenarios. In
this paper, we propose a new perspective that, as long as the foundation model
is powerful enough, straightforward model modifications with flexible
fine-tuning strategies can largely address the above challenges, taking a step
towards controllable character animation in the wild. Specifically, we
introduce RealisDance-DiT, built upon the Wan-2.1 video foundation model. Our
sufficient analysis reveals that the widely adopted Reference Net design is
suboptimal for large-scale DiT models. Instead, we demonstrate that minimal
modifications to the foundation model architecture yield a surprisingly strong
baseline. We further propose the low-noise warmup and “large batches and small
iterations” strategies to accelerate model convergence during fine-tuning while
maximally preserving the priors of the foundation model. In addition, we
introduce a new test dataset that captures diverse real-world challenges,
complementing existing benchmarks such as TikTok dataset and UBC fashion video
dataset, to comprehensively evaluate the proposed method. Extensive experiments
show that RealisDance-DiT outperforms existing methods by a large margin.

Source link

What's Hot

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs – Takara TLDR

Huawei Unveils ‘Safe’ DeepSeek Model to Meet China’s AI Rules

US Agencies Gain Approval to Use Meta’s Llama AI

Paper page – RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs – Takara TLDR

Ask-to-Clarify: Resolving Instruction Ambiguity through Multi-turn Dialogue – Takara TLDR

RGB-Only Supervised Camera Parameter Optimization in Dynamic Scenes – Takara TLDR

St. Patrick’s Cathedral Unveils Monumental Mural by Adam Cvijanovic

Three Loaned Banksy Works Incite Dispute Between England and Italy

Major Collection of Old Masters Paintings Could Be Fractionalized

100 Must-See Artworks at the Metropolitan Museum of Art

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs – Takara TLDR

Huawei Unveils ‘Safe’ DeepSeek Model to Meet China’s AI Rules

US Agencies Gain Approval to Use Meta’s Llama AI

What's Hot

Paper page – RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

Related Posts

Subscribe to Updates