Browsing: Hugging Face
VRAG-RL, a reinforcement learning framework, enhances reasoning and visual information handling in RAG methods by integrating visual perception tokens and…
EPiC is a framework for efficient 3D camera control in video diffusion models that constructs high-quality anchor videos through first-frame…
Paper page – PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
The work introduces a dataset and model for generating high-quality, multi-layer transparent images using diffusion models and a novel synthesis…
The paper introduces a novel training paradigm—model immunization—where curated, labeled falsehoods are periodically injected into the training of language models,…
The study investigates the role of sparse computational components in the instruction-following capabilities of Large Language Models through systematic analysis…
MUSEG, an RL-based method with timestamp-aware multi-segment grounding, significantly enhances the temporal understanding of large language models by improving alignment…
While the capabilities of Large Language Models (LLMs) have been studied in both Simplified and Traditional Chinese, it is yet…
HuggingKG, a large-scale knowledge graph, enhances open source ML resource management by enabling advanced queries and analyses via HuggingBench. The…
Safe-Sora embeds invisible watermarks into AI-generated videos using a hierarchical adaptive matching mechanism and a 3D wavelet transform-enhanced Mamba architecture,…
DetailFlow, a coarse-to-fine 1D autoregressive image generation method, improves quality and efficiency by using a novel next-detail prediction strategy, fewer…