Browsing: Hugging Face
Captain Cinema generates high-quality short movies from textual descriptions using top-down keyframe planning and bottom-up video synthesis with interleaved training…
Despite the remarkable developments achieved by recent 3D generation works, scaling these methods to geographic extents, such as modeling thousands…
Large reasoning models achieve remarkable performance through extensive chain-of-thought generation, yet exhibit significant computational inefficiency by applying uniform reasoning strategies…
DMOSpeech 2 optimizes duration prediction and introduces teacher-guided sampling to enhance speech synthesis performance and diversity. Diffusion-based text-to-speech (TTS) systems…
TeEFusion enhances text-to-image synthesis by efficiently incorporating classifier-free guidance into text embeddings, reducing inference costs without sacrificing image quality. Recent…
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for enhancing the reasoning capabilities of LLMs. Existing…
Elevate3D enhances both texture and geometry of low-quality 3D assets using HFS-SDEdit and monocular geometry predictors, achieving superior refinement quality.…
The Turing Eye Test evaluates MLLMs’ perceptual abilities through synthetic images, revealing that vision tower generalization is a significant gap…
Ultra3D uses VecSet and Part Attention to accelerate 3D voxel generation while maintaining high quality and resolution. Recent advances in…
HOComp uses MLLMs and attention mechanisms to achieve seamless human-object interactions with consistent appearances in image compositing. While existing image-guided…