Browsing: Hugging Face
This work studies the challenge of transfer animations between characters whose skeletal topologies differ substantially. While many techniques have advanced…
This study investigates the use of Large Language Models (LLMs) for predicting human-perceived misery scores from natural language descriptions of…
We propose a novel approach to image generation by decomposing an image into a structured sequence, where each element in…
Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often…
We present 4DNeX, the first feed-forward framework for generating 4D (i.e., dynamic 3D) scene representations from a single image. In…
Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal…
Matrix-Game 2.0 generates real-time interactive videos using few-step auto-regressive diffusion, addressing the limitations of lengthy inference in existing models. AI-generated…
Classifier-free Guidance (CFG) is a widely used technique in modern diffusion models for enhancing sample quality and prompt adherence. However,…
Traditional multimodal learning approaches require expensive alignment pre-training to bridge vision and language modalities, typically projecting visual features into discrete…
We present visual action prompts, a unified action representation for action-to-video generation of complex high-DoF interactions while maintaining transferable visual…