Browsing: Hugging Face
GUI agent aims to enable automated operations on Mobile/PC devices, which is an important task toward achieving artificial general intelligence.…
The data mixture used in the pre-training of a language model is a cornerstone of its final performance. However, a…
Evaluations of audio-language models (ALMs) — multimodal models that take interleaved audio and text as input and output text –…
User interface (UI) agents promise to make inaccessible or complex UIs easier to access for blind and low-vision (BLV) users.…
Leveraging human motion data to impart robots with versatile manipulation skills has emerged as a promising paradigm in robotic manipulation.…
Large language models (LLMs) excel at complex reasoning tasks such as mathematics and coding, yet they frequently struggle with simple…
We introduce rStar2-Agent, a 14B math reasoning model trained with agentic reinforcement learning to achieve frontier-level performance. Beyond current long…
Recent advancements highlight the importance of GRPO-based reinforcement learning methods and benchmarking in enhancing text-to-image (T2I) generation. However, current methods…
Safety alignment in Large Language Models (LLMs) often involves mediating internal representations to refuse harmful requests. Recent research has demonstrated…
In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model’s generative capabilities across multiple tasks…