Browsing: Hugging Face
Strategies including prompting and contrastive frameworks using latent concepts from sparse autoencoders effectively personalize LLM translations in low-resource settings while…
Paper page – When Do LLMs Admit Their Mistakes? Understanding the Role of Model Belief in Retraction
LLMs rarely retract incorrect answers they believe to be factually correct, but supervised fine-tuning can improve their retraction performance by…
SAKURA is introduced to evaluate the multi-hop reasoning abilities of large audio-language models, revealing their struggles in integrating speech/audio representations.…
A dataset benchmarks spatial and physical reasoning of LLMs using topology optimization tasks without simulation tools. We introduce a novel…
RoPECraft is a training-free method that modifies rotary positional embeddings in diffusion transformers to transfer motion from reference videos, enhancing…
Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning Modern BPE tokenizers often split calendar dates into meaningless fragments,…
An enhanced multimodal language model incorporates thinking process rewards to improve reasoning and generalization, achieving superior performance on benchmarks compared…
Project Page: https://haoningwu3639.github.io/SpatialScore/Paper: https://arxiv.org/abs/2505.17012/Code: https://github.com/haoningwu3639/SpatialScore/Data: https://huggingface.co/datasets/haoningwu/SpatialScore We are currently organizing our data and code, and expect to open-source them within…
A benchmark called VideoGameQA-Bench is introduced to assess Vision-Language Models in video game quality assurance tasks. With video games now…
A novel method called GRIT enhances visual reasoning in MLLMs by generating reasoning chains that integrate both natural language and…