Browsing: Hugging Face
Did you know that fine-tuning retrievers & re-rankers on large but unclean training datasets can harm their performance? 😡 In…
A novel Self-Braking Tuning framework reduces overthinking and unnecessary computational overhead in large reasoning models by enabling the model to…
WebAgent-R1 is a simple yet effective end-to-end multi-turn RL framework for training web agents. It learns directly from online interactions…
LaViDa, a family of vision-language models built on discrete diffusion models, offers competitive performance on multimodal benchmarks with advantages in…
Paper page – Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models
TON, a two-stage training strategy combining supervised fine-tuning with thought dropout and Group Relative Policy Optimization, reduces unnecessary reasoning steps…
Reinforcement Learning (RL) has become a powerful tool for enhancing the reasoning abilities of large language models (LLMs) by optimizing…
A new benchmark, AgentIF, evaluates Large Language Models’ ability to follow complex instructions in realistic agentic scenarios, revealing performance limitations…
FoVer is a method for automatically annotating step-level error labels using formal verification tools to train Process Reward Models, which…
Recently, reasoning-based MLLMs have achieved a degree of success in generating long-form textual reasoning chains. However, they still struggle with…
Think-RM is a framework that enhances generative reward models with long-horizon reasoning and a novel pairwise RLHF pipeline to improve…