Browsing: Hugging Face
Foundation models are increasingly becoming better autonomous programmers, raising the prospect that they could also automate dangerous offensive cyber-operations. Current…
DC-CoT provides a comprehensive benchmark for assessing data-centric distillation techniques in chain-of-thought distillation, focusing on performance and generalization across different…
VLMs are more vulnerable to harmful meme-based prompts than to synthetic images, and while multi-turn interactions offer some protection, significant…
A framework called QwenLong-L1 enhances large reasoning models for long-context reasoning through reinforcement learning, achieving leading performance on document question-answering…
The Transformer Copilot framework enhances large language model performance through a Copilot model that refines the Pilot’s logits based on…
Policy gradient algorithms have been successfully applied to enhance the reasoning capabilities of large language models (LLMs). Despite the widespread…
Orthogonal Residual Updates enhance feature learning and training stability by decomposing module outputs to contribute primarily novel features. Residual connections…
Synthetic Data RL enhances foundation models through reinforcement learning using only synthetic data, achieving performance comparable to models trained with…
Temporal reasoning is pivotal for Large Language Models (LLMs) to comprehend the real world. However, existing works neglect the real-world…
RIPT-VLA is a reinforcement learning-based interactive post-training paradigm that enhances pretrained Vision-Language-Action models using sparse binary success rewards, improving adaptability…