Browsing: Hugging Face
We propose Lavida-O, a unified Masked Diffusion Model (MDM) for multimodal understanding and generation. Unlike existing multimodal MDMs such as…
Cutting-edge Artificial Intelligence (AI) techniques keep reshaping our view of the world. For example, Large Language Models (LLMs) based applications…
Recent advances in Large Vision-Language models (LVLM) have spurred significant progress in document parsing task. Compared to traditional pipeline-based methods,…
Implicit Chain-of-Thought (CoT) methods present a promising, token-efficient alternative to explicit CoT reasoning in Large Language Models (LLMs), but a…
The remarkable zero-shot capabilities of Large Language Models (LLMs) have propelled natural language processing from task-specific models to unified, generalist…
We introduce EmbeddingGemma, a new lightweight, open text embedding model based on the Gemma 3 language model family. Our innovative…
Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D…
Recent advances in foundation models highlight a clear trend toward unification and scaling, showing emergent capabilities across diverse domains. While…
Imitation-learning-based visuomotor policies have been widely used in robot manipulation, where both visual observations and proprioceptive states are typically adopted…
Unified multimodal models have recently attracted considerable attention for their remarkable abilities in jointly understanding and generating diverse content. However,…