Paper Page - MOSPA: Human Motion Generation Driven By Spatial Audio

MOSPA: Human Motion Generation Driven by Spatial Audio

Project Page: https://frank-zy-dou.github.io/projects/MOSPA/index.html
Paper: https://arxiv.org/abs/2507.11949

Abstract: Enabling virtual humans to dynamically and realistically respond to diverse auditory stimuli remains a key challenge in character animation, demanding the integration of perceptual modeling and motion synthesis. Despite its significance, this task remains largely unexplored. Most previous works have primarily focused on mapping modalities like speech, audio, and music to generate human motion. As of yet, these models typically overlook the impact of spatial features encoded in spatial audio signals on human motion. To bridge this gap and enable high-quality modeling of human movements in response to spatial audio, we introduce the first comprehensive Spatial Audio-Driven Human Motion (SAM) dataset, which contains diverse and high-quality spatial audio and motion data. For benchmarking, we develop a simple yet effective diffusion-based generative framework for human MOtion generation driven by SPatial Audio, termed MOSPA, which faithfully captures the relationship between body motion and spatial audio through an effective fusion mechanism. Once trained, MOSPA could generate diverse realistic human motions conditioned on varying spatial audio inputs. We perform a thorough investigation of the proposed dataset and conduct extensive experiments for benchmarking, where our method achieves state-of-the-art performance on this task. Our model and dataset will be open-sourced upon acceptance. Please refer to our supplementary video for more details.

Source link

What's Hot

Putting AI To Work To Stymie The Email Fraudsters And Crooks

Why Big Investors Are All Ears For Voice AI Startups

AI gaming startup Born raises $15M to build ‘social’ AI companions that combat loneliness

Paper page – MOSPA: Human Motion Generation Driven by Spatial Audio

UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward – Takara TLDR

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions – Takara TLDR

Q-Sched: Pushing the Boundaries of Few-Step Diffusion Models with Quantization-Aware Scheduling – Takara TLDR

Leon Black and Leslie Wexner’s Letters to Jeffrey Epstein Released

School of Visual Arts Transfers Ownership to Nonprofit Alumni Society

Cristin Tierney Moves Gallery to Tribeca for 15th Anniversary Exhibition

Anne Imhof Reimagines Football Jerseys with Nike

Putting AI To Work To Stymie The Email Fraudsters And Crooks

Why Big Investors Are All Ears For Voice AI Startups

AI gaming startup Born raises $15M to build ‘social’ AI companions that combat loneliness

What's Hot

Paper page – MOSPA: Human Motion Generation Driven by Spatial Audio

Related Posts

Subscribe to Updates