TalkVid: A Large-Scale Diversified Dataset For Audio-Driven Talking Head Synthesis - Takara TLDR

Audio-driven talking head synthesis has achieved remarkable photorealism, yet
state-of-the-art (SOTA) models exhibit a critical failure: they lack
generalization to the full spectrum of human diversity in ethnicity, language,
and age groups. We argue that this generalization gap is a direct symptom of
limitations in existing training data, which lack the necessary scale, quality,
and diversity. To address this challenge, we introduce TalkVid, a new
large-scale, high-quality, and diverse dataset containing 1244 hours of video
from 7729 unique speakers. TalkVid is curated through a principled, multi-stage
automated pipeline that rigorously filters for motion stability, aesthetic
quality, and facial detail, and is validated against human judgments to ensure
its reliability. Furthermore, we construct and release TalkVid-Bench, a
stratified evaluation set of 500 clips meticulously balanced across key
demographic and linguistic axes. Our experiments demonstrate that a model
trained on TalkVid outperforms counterparts trained on previous datasets,
exhibiting superior cross-dataset generalization. Crucially, our analysis on
TalkVid-Bench reveals performance disparities across subgroups that are
obscured by traditional aggregate metrics, underscoring its necessity for
future research. Code and data can be found in
https://github.com/FreedomIntelligence/TalkVid

Source link

What's Hot

C3.ai (AI) Reports Q2 Results Tomorrow

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code – Takara TLDR

Qwen Team Open Sources State-of-the-Art Image Model Qwen-Image

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis – Takara TLDR

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code – Takara TLDR

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control – Takara TLDR

R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning – Takara TLDR

80 Museum Exhibitions and Biennials to See in Fall 2025

Woodmere Art Museum Sues Trump Administration Over Canceled IMLS Grant

Barbara Gladstone’s Chelsea Townhouse in NYC Sells for $13.1 M.

Trump Meets with Smithsonian Leader Amid Threats of Content Review

C3.ai (AI) Reports Q2 Results Tomorrow

A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code – Takara TLDR

Qwen Team Open Sources State-of-the-Art Image Model Qwen-Image

What's Hot

TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis – Takara TLDR

Related Posts

Subscribe to Updates