Browsing: Hugging Face
Recent advances in text-to-audio (TTA) generation excel at synthesizing short audio clips but struggle with long-form narrative audio, which requires…
Serving Large Language Models (LLMs) is a GPU-intensive task where traditional autoscalers fall short, particularly for modern Prefill-Decode (P/D) disaggregated…
Motion generation is essential for animating virtual characters and embodied agents. While recent text-driven methods have made significant strides, they…
The ability to research and synthesize knowledge is central to human expertise and progress. An emerging class of systems promises…
3D local editing of specified regions is crucial for game industry and robot interaction. Recent methods typically edit rendered multi-view…
Inferring the physical properties of 3D scenes from visual information is a critical yet challenging task for creating interactive and…
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution…
Current state-of-the-art (SOTA) methods for audio-driven character animation demonstrate promising performance for scenarios primarily involving speech and singing. However, they…
Recent video foundation models such as SAM2 excel at prompted video segmentation by treating masks as a general-purpose primitive. However,…
Large Language Models (LLMs) have reshaped our world with significant advancements in science, engineering, and society through applications ranging from…