Browsing: Hugging Face
Recent vision-language models (VLMs) show strong results on offline image and video understanding, but their performance in interactive, embodied environments…
Large language models (LLMs) excel at natural language understanding and generation but remain vulnerable to factual errors, limiting their reliability…
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing…
MoVieS synthesizes 4D dynamic novel views from monocular videos using Gaussian primitives, enabling unified modeling of appearance, geometry, and motion…
CompassJudger-2, a generalist judge model, achieves superior performance across multiple benchmarks through task-driven data curation, verifiable rewards, and a refined…
We introduce MetaStone-S1, a pioneering reflective generative model designed to significantly enhance test-time scaling (TTS) capabilities through the new reflective…
Lumos-1 is an autoregressive video generator that uses a modified LLM architecture with MM-RoPE and AR-DF to address spatiotemporal correlation…
Generative reward models using LLMs are vulnerable to superficial manipulations but can be improved with data augmentation strategies. AI-generated summary…
In this paper, we mainly address two challenges faced by existing MoE architectures: Performance compromise caused by imperfect routing, especially…
A part-aware diffusion framework, CoPart, enhances 3D generation by decomposing objects into contextual parts, improving complexity handling, relationship modeling, and…