Browsing: Hugging Face
💫 Excited to share our recent work: BrowseComp-ZH, the first high-difficulty benchmark specifically designed to evaluate large language models (LLMs)…
Most existing video anomaly detectors rely solely on RGB frames, which lack the temporal resolution needed to capture abrupt or…
Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when…
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning…
Action customization involves generating videos where the subject performs actions dictated by input control signals. Current methods use pose-guided or…
Large Language Models (LLMs) have demonstrated unprecedented capabilities across various natural language processing tasks. Their ability to process and generate…
Recent advancements in AI-driven soccer understanding have demonstrated rapid progress, yet existing research predominantly focuses on isolated or narrow tasks.…
✨ Highlights Low Latency. VITA-Audio is the first end-to-end speech model capable of generating audio during the initial forward pass.…
The rapid advancement of diffusion models holds the promise of revolutionizing the application of VR and AR technologies, which typically…
In recent years, multi-agent frameworks powered by large language models (LLMs) have advanced rapidly. Despite this progress, there is still…