Browsing: Hugging Face
The reliability of large language models (LLMs) during test-time scaling is often assessed with \emph{external verifiers} or \emph{reward models} that…
Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We…
Graphical user interface (GUI) agents built on vision-language models have emerged as a promising approach to automate human-computer workflows. However,…
Recent advances in reasoning capabilities of large language models (LLMs) are largely driven by reinforcement learning (RL), yet the underlying…
Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based…
Obtaining high-quality generations in modern LLMs has largely been framed as a selection problem: identifying a single winning generation from…
The training paradigm for large language models (LLMs) is moving from static datasets to experience-based learning, where agents acquire skills…
While recent generative models advance pixel-space video synthesis, they remain limited in producing professional educational videos, which demand disciplinary knowledge,…
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key ingredient for unlocking complex reasoning capabilities in large language…
Large Language Model (LLM) safety is one of the most pressing challenges for enabling wide-scale deployment. While most studies and…