Browsing: Hugging Face
Large vision-language models (LVLMs) remain vulnerable to hallucination, often generating content misaligned with visual inputs. While recent approaches advance multi-modal…
Reinforcement learning (RL) yields substantial improvements in large language models’ (LLMs) downstream task performance and alignment with human values. Surprisingly,…
A training-free framework named 3DTown generates realistic 3D scenes from a single top-down image using region-based generation and spatial-aware 3D…
There is a newly identified risk that creators of open-source LLMs can extract fine-tuning data from downstream models through backdoor…
An Internet-Augmented text-to-image generation framework improves uncertain text prompt handling by integrating reference images, enhancing image quality and fidelity. Current…
AutoMat, an agent-assisted pipeline, transforms atomic-resolution STEM images into simulation-ready atomic crystal structures and predicts their properties, overcoming the bottleneck…
UniVG-R1, a reasoning-guided multimodal large language model, enhances visual grounding by leveraging reinforcement learning and a difficulty-aware strategy, achieving state-of-the-art…
Soft Thinking, a training-free method, enhances reasoning by generating soft, abstract concept tokens in a continuous space, improving accuracy and…
RLVR-World uses reinforcement learning with verifiable rewards to optimize world models for task-specific metrics, achieving improved performance across language and…
A KV-cache-like mechanism, delayed KV-Cache, accelerates diffusion language models’ inference without significantly degrading performance. Diffusion Language Models (DLMs) have been…