Browsing: Hugging Face
AutoMat, an agent-assisted pipeline, transforms atomic-resolution STEM images into simulation-ready atomic crystal structures and predicts their properties, overcoming the bottleneck…
UniVG-R1, a reasoning-guided multimodal large language model, enhances visual grounding by leveraging reinforcement learning and a difficulty-aware strategy, achieving state-of-the-art…
Soft Thinking, a training-free method, enhances reasoning by generating soft, abstract concept tokens in a continuous space, improving accuracy and…
RLVR-World uses reinforcement learning with verifiable rewards to optimize world models for task-specific metrics, achieving improved performance across language and…
A KV-cache-like mechanism, delayed KV-Cache, accelerates diffusion language models’ inference without significantly degrading performance. Diffusion Language Models (DLMs) have been…
Code-switching is a common phenomenon of alternating between different languages in the same utterance, thought, or conversation. We posit that…
Large multimodal models excel in multimodal tasks but face significant computational challenges due to excessive computation on visual tokens. Unlike…
Despite impressive advancements in Visual-Language Models (VLMs) for multi-modal tasks, their reliance on RGB inputs limits precise spatial understanding. Existing…
LLM pruning has emerged as a promising technology for compressing LLMs, enabling their deployment on resource-limited devices. However, current methodologies…
This paper introduces Flexive, a novel generative verifier, and the Solve-Detect-Verify pipeline to address the trade-off between accuracy and computational…