Browsing: Hugging Face
We introduce 𝙲𝚘𝚖𝚙𝚕𝚎𝚡-𝙴𝚍𝚒𝚝, a comprehensive benchmark designed to systematically evaluate instruction-based image editing models across instructions of varying complexity. To…
While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language…
Large Video Models (LVMs) built upon Large Language Models (LLMs) have shown promise in video understanding but often suffer from…
World simulation has gained increasing popularity due to its ability to model virtual environments and predict the consequences of actions.…
Large Language Models (LLMs) have shown tremendous potential as agents, excelling at tasks that require multiple rounds of reasoning and…
Charts are ubiquitous, as people often use them to analyze data, answer questions, and discover critical insights. However, performing complex…
The success of text-to-image (T2I) generation models has spurred a proliferation of numerous model checkpoints fine-tuned from the same base…
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses. However, in…
Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences.…
Ensuring the ethical deployment of text-to-image models requires effective techniques to prevent the generation of harmful or inappropriate content. While…