Browsing: Hugging Face
UniTEX generates high-quality, consistent 3D textures by using Texture Functions and adapting Diffusion Transformers directly from images and geometry without…
Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep…
Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs…
VRAG-RL, a reinforcement learning framework, enhances reasoning and visual information handling in RAG methods by integrating visual perception tokens and…
EPiC is a framework for efficient 3D camera control in video diffusion models that constructs high-quality anchor videos through first-frame…
Paper page – PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models
The work introduces a dataset and model for generating high-quality, multi-layer transparent images using diffusion models and a novel synthesis…
The paper introduces a novel training paradigm—model immunization—where curated, labeled falsehoods are periodically injected into the training of language models,…
The study investigates the role of sparse computational components in the instruction-following capabilities of Large Language Models through systematic analysis…
MUSEG, an RL-based method with timestamp-aware multi-segment grounding, significantly enhances the temporal understanding of large language models by improving alignment…
While the capabilities of Large Language Models (LLMs) have been studied in both Simplified and Traditional Chinese, it is yet…