Browsing: Hugging Face
Paper page – Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
TON, a two-stage training strategy combining supervised fine-tuning with thought dropout and Group Relative Policy Optimization, reduces unnecessary reasoning steps…
Reinforcement Learning (RL) has become a powerful tool for enhancing the reasoning abilities of large language models (LLMs) by optimizing…
A new benchmark, AgentIF, evaluates Large Language Models’ ability to follow complex instructions in realistic agentic scenarios, revealing performance limitations…
FoVer is a method for automatically annotating step-level error labels using formal verification tools to train Process Reward Models, which…
Recently, reasoning-based MLLMs have achieved a degree of success in generating long-form textual reasoning chains. However, they still struggle with…
Think-RM is a framework that enhances generative reward models with long-horizon reasoning and a novel pairwise RLHF pipeline to improve…
RAVENEA, a retrieval-augmented benchmark, enhances visual culture understanding in VLMs through culture-focused tasks and outperforms non-augmented models across various metrics.…
The study identifies and analyzes OCR Heads within Large Vision Language Models, revealing their unique activation patterns and roles in…
Metaphorical comprehension in images remains a critical challenge for AI systems, as existing models struggle to grasp the nuanced cultural,…
Multi-SpatialMLLM framework enhances MLLMs with multi-frame spatial understanding through depth perception, visual correspondence, and dynamic perception, achieving significant gains in…