Browsing: Hugging Face
MAGREF is a unified framework for video generation that uses masked guidance and dynamic masking for coherent multi-subject synthesis from…
Chain-of-thought (CoT) reasoning enables large language models (LLMs) to move beyond fast System-1 responses and engage in deliberative System-2 reasoning.…
Researchers propose a novel differentiable solver search algorithm that optimizes the computational efficiency and quality of diffusion models for image…
Re-ttention uses temporal redundancy in diffusion models to enable high sparse attention in visual generation, maintaining quality with minimal computational…
ViGoRL, a vision-language model enhanced with visually grounded reinforcement learning, achieves superior performance across various visual reasoning tasks by dynamically…
Theoretical analysis of Direct Preference Optimization (DPO) reveals that log-ratio reward parameterization is optimal for learning target policy via preference…
The difficulty-aware prompting method shortens reasoning traces in a dataset, improving model performance and efficiency across various benchmarks. Existing chain-of-thought…
A novel block-wise approximate KV Cache and confidence-aware parallel decoding strategy improve the inference speed of diffusion-based large language models…
TrustVLM enhances the reliability of Vision-Language Models by estimating prediction trustworthiness without retraining, improving misclassification detection in multimodal tasks. Vision-Language…
A novel pairwise-comparison framework using CreataSet dataset trains CrEval, an LLM-based evaluator that significantly improves the assessment of textual creativity…