Browsing: Hugging Face
OpenAI’s multimodal GPT-4o has demonstrated remarkable capabilities in image generation and editing, yet its ability to achieve world knowledge-informed semantic…
Large language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information…
Recently, improving the reasoning ability of large multimodal models (LMMs) through reinforcement learning has made great progress. However, most existing…
Effective reasoning is crucial to solving complex mathematical problems. Recent large language models (LLMs) have boosted performance by scaling test-time…
Graphical User Interface (GUI) agents offer cross-platform solutions for automating complex digital tasks, with significant potential to transform productivity workflows.…
Scientific equation discovery is a fundamental task in the history of scientific progress, enabling the derivation of laws governing natural…
Overview of EmoEval for Evaluating Mental Safety of AI-human Interactions. The simulation consists of four steps: (1) User Agent Initialization…
World modeling is a crucial task for enabling intelligent agents to effectively interact with humans and operate in dynamic environments.…
Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite…
We propose a new problem, In-2-4D, for generative 4D (i.e., 3D + motion) inbetweening from a minimalistic input setting: two…