Browsing: Hugging Face
Recently, reasoning-based MLLMs have achieved a degree of success in generating long-form textual reasoning chains. However, they still struggle with…
Think-RM is a framework that enhances generative reward models with long-horizon reasoning and a novel pairwise RLHF pipeline to improve…
RAVENEA, a retrieval-augmented benchmark, enhances visual culture understanding in VLMs through culture-focused tasks and outperforms non-augmented models across various metrics.…
The study identifies and analyzes OCR Heads within Large Vision Language Models, revealing their unique activation patterns and roles in…
Metaphorical comprehension in images remains a critical challenge for AI systems, as existing models struggle to grasp the nuanced cultural,…
Multi-SpatialMLLM framework enhances MLLMs with multi-frame spatial understanding through depth perception, visual correspondence, and dynamic perception, achieving significant gains in…
Strategies including prompting and contrastive frameworks using latent concepts from sparse autoencoders effectively personalize LLM translations in low-resource settings while…
Paper page – When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
LLMs rarely retract incorrect answers they believe to be factually correct, but supervised fine-tuning can improve their retraction performance by…
SAKURA is introduced to evaluate the multi-hop reasoning abilities of large audio-language models, revealing their struggles in integrating speech/audio representations.…
A dataset benchmarks spatial and physical reasoning of LLMs using topology optimization tasks without simulation tools. We introduce a novel…