Browsing: Hugging Face
TrustVLM enhances the reliability of Vision-Language Models by estimating prediction trustworthiness without retraining, improving misclassification detection in multimodal tasks. Vision-Language…
A novel pairwise-comparison framework using CreataSet dataset trains CrEval, an LLM-based evaluator that significantly improves the assessment of textual creativity…
ChartLens enhances multimodal language models with fine-grained visual attributions, improving the accuracy of chart understanding by 26-66%. The growing capabilities…
(based on a thread on Twitter) Preferences drive modern LLM research and development: from model alignment to evaluation.But how well…
The introduction of SridBench, a benchmark for scientific figure generation, reveals that current top-tier models, such as GPT-4o-image, fall short…
CheXStruct and CXReasonBench evaluate Large Vision-Language Models in clinical diagnosis by assessing structured reasoning, visual grounding, and generalization using the…
The study explores the structural patterns of knowledge in large language models from a graph perspective, uncovering knowledge homophily and…
UniTEX generates high-quality, consistent 3D textures by using Texture Functions and adapting Diffusion Transformers directly from images and geometry without…
Audio source separation is fundamental for machines to understand complex acoustic environments and underpins numerous audio applications. Current supervised deep…
Classifier-Free Guidance (CFG) significantly enhances controllability in generative models by interpolating conditional and unconditional predictions. However, standard CFG often employs…