Paper Page - SAKURA: On The Multi-hop Reasoning Of Large Audio-Language Models Based On Speech And Audio Information

SAKURA is introduced to evaluate the multi-hop reasoning abilities of large audio-language models, revealing their struggles in integrating speech/audio representations.

Large audio-language models (LALMs) extend the large language models with
multimodal understanding in speech, audio, etc. While their performances on
speech and audio-processing tasks are extensively studied, their reasoning
abilities remain underexplored. Particularly, their multi-hop reasoning, the
ability to recall and integrate multiple facts, lacks systematic evaluation.
Existing benchmarks focus on general speech and audio-processing tasks,
conversational abilities, and fairness but overlook this aspect. To bridge this
gap, we introduce SAKURA, a benchmark assessing LALMs’ multi-hop reasoning
based on speech and audio information. Results show that LALMs struggle to
integrate speech/audio representations for multi-hop reasoning, even when they
extract the relevant information correctly, highlighting a fundamental
challenge in multimodal reasoning. Our findings expose a critical limitation in
LALMs, offering insights and resources for future research.

Source link

What's Hot

VCs Funding More Tools For Frontline Workers

Nuclearn gets $10.5M to help the nuclear industry embrace AI

New In-Depth Report Of AI Large Language Models: Hallucination Control

Paper page – SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information

D-HUMOR: Dark Humor Understanding via Multimodal Open-ended Reasoning – Takara TLDR

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents – Takara TLDR

Llama-GENBA-10B: A Trilingual Large Language Model for German, English and Bavarian – Takara TLDR

Storied Collector and MoMA Trustee Dies at 92

Congress Obtains Drawing Trump Apparently Made for Jeffrey Epstein

Galerie Gmurzynska Slated to Open in New York’s Fuller Building

Woodmere Art Museum Drops Lawsuit Against Trump Administration

VCs Funding More Tools For Frontline Workers

Nuclearn gets $10.5M to help the nuclear industry embrace AI

New In-Depth Report Of AI Large Language Models: Hallucination Control

What's Hot

Paper page – SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information

Related Posts

Subscribe to Updates