Browsing: Hugging Face
Safe-Sora embeds invisible watermarks into AI-generated videos using a hierarchical adaptive matching mechanism and a 3D wavelet transform-enhanced Mamba architecture,…
DetailFlow, a coarse-to-fine 1D autoregressive image generation method, improves quality and efficiency by using a novel next-detail prediction strategy, fewer…
A distributed inference strategy, DualParal, is proposed to address high processing latency and memory costs in diffusion transformer-based video diffusion…
SoloSpeech, a cascaded generative pipeline, improves target speech extraction and speech separation by addressing artifact introduction, naturalness reduction, and environment…
VisTA, a reinforcement learning framework, enhances visual reasoning by autonomously selecting and combining tools from a diverse library without extensive…
CLEANMOL, a novel framework, enhances structural comprehension in large language models for molecular science by formulating SMILES parsing into structured…
State-of-the-art text-to-motion generation models rely on the kinematic-aware, local-relative motion representation popularized by HumanML3D, which encodes motion relative to the…
In this work, we aim to incentivize the reasoning ability of Multimodal Large Language Models (MLLMs) via reinforcement learning (RL)…
Shorter reasoning chains in LLMs can achieve similar or better performance with reduced computational cost and inference time compared to…
Open-source Code Graph Models enhance repository-level code generation tasks by integrating code graph structures into LLMs’ attention mechanisms, achieving high…