AuditoryBench++: Can Language Models Understand Auditory Knowledge Without Hearing? - Takara TLDR

Even without directly hearing sounds, humans can effortlessly reason about
auditory properties, such as pitch, loudness, or sound-source associations,
drawing on auditory commonsense. In contrast, language models often lack this
capability, limiting their effectiveness in multimodal interactions. As an
initial step to address this gap, we present AuditoryBench++, a comprehensive
benchmark for evaluating auditory knowledge and reasoning in text-only
settings. The benchmark encompasses tasks that range from basic auditory
comparisons to contextually grounded reasoning, enabling fine-grained analysis
of how models process and integrate auditory concepts. In addition, we
introduce AIR-CoT, a novel auditory imagination reasoning method that generates
and integrates auditory information during inference through span detection
with special tokens and knowledge injection. Extensive experiments with recent
LLMs and Multimodal LLMs demonstrate that AIR-CoT generally outperforms both
the off-the-shelf models and those augmented with auditory knowledge. The
project page is available at https://auditorybenchpp.github.io.

Source link

What's Hot

Rapid7, Elastic, C3.ai, Wix, and UiPath Shares Are Falling, What You Need To Know

LIMI: Less is More for Agency – Takara TLDR

Running deep research AI agents on Amazon Bedrock AgentCore

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

LIMI: Less is More for Agency – Takara TLDR

OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models – Takara TLDR

Qwen3-Omni Technical Report – Takara TLDR

Court Rules ‘Gender Ideology’ Ban on Art Endowments Unconstitutional

Rural Danish Art Museum Acquires Painting By Artemisia Gentileschi

Dan Nadel Is Expanding American Art History, One Outlier at a Time

Bernard Arnault Says French Wealth Tax Will ‘Destroy’ the Economy

Rapid7, Elastic, C3.ai, Wix, and UiPath Shares Are Falling, What You Need To Know

LIMI: Less is More for Agency – Takara TLDR

Running deep research AI agents on Amazon Bedrock AgentCore

What's Hot

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing? – Takara TLDR

Related Posts

Subscribe to Updates