Browsing: Hugging Face
Large Language Models (LLMs) can struggle to balance gullibility to misinformation and resistance to valid corrections in persuasive dialogues, a…
We introduce the first data-driven multi-view 3D point tracker, designed to track arbitrary points in dynamic scenes using multiple camera…
Human social behaviors are inherently multimodal necessitating the development of powerful audiovisual models for their perception. In this paper, we…
Existing literature typically treats style-driven and subject-driven generation as two disjoint tasks: the former prioritizes stylistic similarity, whereas the latter…
The learning from practice paradigm is crucial for developing capable Agentic AI systems, yet it is severely hampered by inefficient…
Diverse instruction data is vital for effective instruction tuning of large language models, as it enables the model to generalize…
We introduce MCP-Bench, a benchmark for evaluating large language models (LLMs) on realistic, multi-step tasks that demand tool use, cross-tool…
Long video generation is fundamentally a long context memory problem: models must retain and retrieve salient events across a long…
As multi-turn dialogues with large language models (LLMs) grow longer and more complex, how can users better evaluate and review…
Recent Vision-Language-Action (VLA) models built on pre-trained Vision-Language Models (VLMs) require extensive post-training, resulting in high computational overhead that limits…