Browsing: Hugging Face
Humans naturally share information with those they are connected to, and video has become one of the dominant mediums for…
Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The…
Recognizing and reasoning about occluded (partially or fully hidden) objects is vital to understanding visual scenes, as occlusions frequently occur…
Intellectual Property (IP) is a unique domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. As…
Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often…
Recent video large language models (Video LLMs) often depend on costly human annotations or proprietary model APIs (e.g., GPT-4o) to…
Controllable character animation remains a challenging problem, particularly in handling rare poses, stylized characters, character-object interactions, complex illumination, and dynamic…
We propose MR. Video, an agentic long video understanding framework that demonstrates the simple yet effective MapReduce principle for processing…
The success of Large Language Models (LLMs) has sparked interest in various agentic applications. A key hypothesis is that LLMs,…
LLM agents are an emerging form of AI systems where large language models (LLMs) serve as the central component, utilizing…