Paper Page - C3: A Bilingual Benchmark For Spoken Dialogue Models Exploring Challenges In Complex Conversations

📣 C3 Benchmark: The Challenging Benchmark for Bilingual Speech Dialogue Models!

🎙️ C3 is the first-ever benchmark dataset that tests complex phenomena in speech dialogues, covering pauses, homophones, stress, intonation, syntactic ambiguity, coreference, omission, and multi-turn conversations.
📊 With 1,079 real-world scenarios and 1,586 audio-text pairs, it leaves speech dialogue models struggling to keep up!

🔥 Challenge Examples:

“He saw the man / with glasses” vs “He saw / the man with glasses”: Does he wear glasses or the man?
“Mr. Smith loves music more than his wife”: Does it mean “Mr. Smith loves music more than he loves his wife” or “Mr. Smith loves music more than his wife does”?
“Joan made sure to thank Susan for all the help she had received”: Does “she” refer to Joan or Susan?

📈 Evaluation Results (As of July 30, 2025):

Best Model in Chinese: Qwen2.5-Omni (40.08%)
Best Model in English: GPT-4o-Audio-Preview (55.68%)

🔗 Experience C3 Now:

🔥 Limited Time Offer! We can help you run the evaluation script for your SDM’s result on our benchmark, free of charge until Sept. 1, 2025. After that, you can run the evaluation independently. To participate, email chengqianma@yeah.net with subject: [C3Bench Evaluation] – [Model_Name]

Source link

What's Hot

Google DeepMind updates Frontier Safety Framework for AI model risks

Rocket.new, one of India’s first vibe-coding startups, snags $15M from Accel, Salesforce Ventures

Altman, Huang negotiations that sealed $100 billion OpenAI-Nvidia deal

Paper page – C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

MetaEmbed: Scaling Multimodal Retrieval at Test-Time with Flexible Late Interaction – Takara TLDR

OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System – Takara TLDR

SPATIALGEN: Layout-guided 3D Indoor Scene Generation – Takara TLDR

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

St. Patrick’s Cathedral Unveils Monumental Mural by Adam Cvijanovic

Three Loaned Banksy Works Incite Dispute Between England and Italy

Major Collection of Old Masters Paintings Could Be Fractionalized

Google DeepMind updates Frontier Safety Framework for AI model risks

Rocket.new, one of India’s first vibe-coding startups, snags $15M from Accel, Salesforce Ventures

Altman, Huang negotiations that sealed $100 billion OpenAI-Nvidia deal

What's Hot

Paper page – C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

Related Posts

Subscribe to Updates