Browsing: Hugging Face
Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often…
Recent video large language models (Video LLMs) often depend on costly human annotations or proprietary model APIs (e.g., GPT-4o) to…
Controllable character animation remains a challenging problem, particularly in handling rare poses, stylized characters, character-object interactions, complex illumination, and dynamic…
We propose MR. Video, an agentic long video understanding framework that demonstrates the simple yet effective MapReduce principle for processing…
The success of Large Language Models (LLMs) has sparked interest in various agentic applications. A key hypothesis is that LLMs,…
LLM agents are an emerging form of AI systems where large language models (LLMs) serve as the central component, utilizing…
The increasing demand for AR/VR applications has highlighted the need for high-quality 360-degree panoramic content. However, generating high-quality 360-degree panoramic…
Current Large Language Models (LLMs) often undergo supervised fine-tuning (SFT) to acquire tool use capabilities. However, SFT struggles to generalize…
Multi-turn interactions with language models (LMs) pose critical safety risks, as harmful intent can be strategically spread across exchanges. Yet,…
Recent Computer-Using Agents (CUAs), powered by multimodal large language models (LLMs), offer a promising direction for automating complex desktop workflows…