Browsing: Hugging Face
A large-scale benchmark, MMHU, is proposed for human behavior analysis in autonomous driving, featuring rich annotations and diverse data sources,…
MOSPA: Human Motion Generation Driven by Spatial Audio Project Page: https://frank-zy-dou.github.io/projects/MOSPA/index.htmlPaper: https://arxiv.org/abs/2507.11949 Abstract: Enabling virtual humans to dynamically and realistically…
Lizard is a linearization framework that transforms Transformer-based LLMs into subquadratic architectures for efficient infinite-context generation, using a hybrid attention…
SWE-Perf is a benchmark for evaluating Large Language Models in code performance optimization using real-world repository data. Code performance optimization…
AnyI2V is a training-free framework that animates conditional images with user-defined motion trajectories, supporting various data types and enabling flexible…
A benchmark evaluates multimodal models’ ability to interpret scientific schematic diagrams and answer related questions, revealing performance gaps and insights…
Recent advancements in reasoning-based Large Language Models (LLMs), particularly their potential through test-time scaling, have created significant opportunities for distillation…
Recent vision-language models (VLMs) show strong results on offline image and video understanding, but their performance in interactive, embodied environments…
Large language models (LLMs) excel at natural language understanding and generation but remain vulnerable to factual errors, limiting their reliability…
Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing…