HumanSense: From Multimodal Perception To Empathetic Context-Aware Responses Through Reasoning MLLMs - Takara TLDR

While Multimodal Large Language Models (MLLMs) show immense promise for
achieving truly human-like interactions, progress is hindered by the lack of
fine-grained evaluation frameworks for human-centered scenarios, encompassing
both the understanding of complex human intentions and the provision of
empathetic, context-aware responses. Here we introduce HumanSense, a
comprehensive benchmark designed to evaluate the human-centered perception and
interaction capabilities of MLLMs, with a particular focus on deep
understanding of extended multimodal contexts and the formulation of rational
feedback. Our evaluation reveals that leading MLLMs still have considerable
room for improvement, particularly for advanced interaction-oriented tasks.
Supplementing visual input with audio and text information yields substantial
improvements, and Omni-modal models show advantages on these tasks.
Furthermore, we argue that appropriate feedback stems from a contextual
analysis of the interlocutor’s needs and emotions, with reasoning ability
serving as the key to unlocking it. Accordingly, we employ a multi-stage,
modality-progressive reinforcement learning to enhance the reasoning abilities
of an Omni model, achieving substantial gains on evaluation results.
Additionally, we observe that successful reasoning processes exhibit highly
consistent thought patterns. By designing corresponding prompts, we also
enhance the performance of non-reasoning models in a training-free manner.
Project page:
\textcolor{brightpink}https://digital-avatar.github.io/ai/HumanSense/

Source link

What's Hot

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning – Takara TLDR

Thinking Machines debuts Tinker, a developer tool to simplify fine-tuning of AI models | Technology News

What to expect from free Perplexity AI Comet Browser: Enhanced multitasking?

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs – Takara TLDR

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning – Takara TLDR

TimeSeriesScientist: A General-Purpose AI Agent for Time Series Analysis – Takara TLDR

Think Right: Learning to Mitigate Under-Over Thinking via Adaptive, Attentive Compression – Takara TLDR

Former ARTnews Publisher Dies at 97

National Gallery of Art Closes as a Result of Government Shutdown

Almine Rech Closes London Gallery After More Than a Decade

Record Exec and Art Collector Gets Over 4 Years

VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning – Takara TLDR

Thinking Machines debuts Tinker, a developer tool to simplify fine-tuning of AI models | Technology News

What to expect from free Perplexity AI Comet Browser: Enhanced multitasking?

What's Hot

HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs – Takara TLDR

Related Posts

Subscribe to Updates