Browsing: Hugging Face
X-Planner, a planning system utilizing a multimodal large language model, decomposes complex text-guided image editing instructions into precise sub-instructions, ensuring…
Mod-X introduces a novel, layered architecture for decentralized, interoperable communication among heterogeneous AI agents, integrating a semantic translation layer, decentralized…
A multimodal agent transforms documents into detailed presentation videos with audio, evaluated using a comprehensive framework involving vision-language models. We…
DiaFORGE is a disambiguation framework that enhances large language models’ ability to invoke enterprise APIs accurately through dialogue synthesis, supervised…
AbstractThe rapid advancement of Large Language Models (LLMs) has intensified the need for evaluation frameworks that address the requirements of…
Self-Correction Bench measures the self-correction blind spot in large language models, finding that training primarily on error-free responses contributes to…
Energy-Based Transformers (EBTs) generalize System 2 Thinking to arbitrary modalities and problem types using a scalable, unsupervised energy-based optimization framework…
A framework dynamically selects and merges pre-trained domain-specific models for efficient and scalable information extraction tasks. Supervised fine-tuning (SFT) is…
A new zero communication overhead sequence parallelism method called ZeCO enables efficient training of large language models with ultra-long sequences…
Recent progress in vision-language segmentation has significantly advanced grounded visual understanding. However, these models often exhibit hallucinations by producing segmentation…