Browsing: Hugging Face

Hugging Face

Paper page – RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

Advanced AI BotMay 24, 2025

RoPECraft is a training-free method that modifies rotary positional embeddings in diffusion transformers to transfer motion from reference videos, enhancing…

Hugging Face

Paper page – Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

Advanced AI BotMay 24, 2025

Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning Modern BPE tokenizers often split calendar dates into meaningless fragments,…

Hugging Face

Paper page – SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Advanced AI BotMay 24, 2025

An enhanced multimodal language model incorporates thinking process rewards to improve reasoning and generalization, achieving superior performance on benchmarks compared…

Hugging Face

Paper page – SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Advanced AI BotMay 24, 2025

Project Page: https://haoningwu3639.github.io/SpatialScore/Paper: https://arxiv.org/abs/2505.17012/Code: https://github.com/haoningwu3639/SpatialScore/Data: https://huggingface.co/datasets/haoningwu/SpatialScore We are currently organizing our data and code, and expect to open-source them within…

Hugging Face

Paper page – VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

Advanced AI BotMay 23, 2025

A benchmark called VideoGameQA-Bench is introduced to assess Vision-Language Models in video game quality assurance tasks. With video games now…

Hugging Face

GRIT: Teaching MLLMs to Think with Images

Advanced AI BotMay 23, 2025

A novel method called GRIT enhances visual reasoning in MLLMs by generating reasoning chains that integrate both natural language and…

Hugging Face

Paper page – SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Advanced AI BotMay 23, 2025

SafeKey enhances the safety of large reasoning models by focusing on activating a safety aha moment in the key sentence…

Hugging Face

Paper page – Training-Free Reasoning and Reflection in MLLMs

Advanced AI BotMay 23, 2025

The FRANK Model enhances multimodal LLMs with reasoning and reflection abilities without retraining, using a hierarchical weight merging approach that…

Hugging Face

Paper page – Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

Advanced AI BotMay 23, 2025

Robo2VLM, a framework for generating Visual Question Answering datasets using robot trajectory data, enhances and evaluates Vision-Language Models by leveraging…

Hugging Face

Paper page – OViP: Online Vision-Language Preference Learning

Advanced AI BotMay 23, 2025

Large vision-language models (LVLMs) remain vulnerable to hallucination, often generating content misaligned with visual inputs. While recent approaches advance multi-modal…

What's Hot

C3 AI Stock Is Soaring Today: Here’s Why – C3.ai (NYSE:AI)

Nvidia To Be Hit By China Chip Export Curbs Or Deliver Q2 Guidance Surprise After Middle East Deal? Here’s What Charts Show Ahead Of Q1 Results – NVIDIA (NASDAQ:NVDA), Oracle (NYSE:ORCL)

Paper page – Language-Image Alignment with Fixed Text Encoders

Browsing: Hugging Face

Paper page – RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers

Paper page – Date Fragments: A Hidden Bottleneck of Tokenization for Temporal Reasoning

Paper page – SophiaVL-R1: Reinforcing MLLMs Reasoning with Thinking Reward

Paper page – SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Paper page – VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance

GRIT: Teaching MLLMs to Think with Images

Paper page – SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

Paper page – Training-Free Reasoning and Reflection in MLLMs

Paper page – Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets

Paper page – OViP: Online Vision-Language Preference Learning

The Timeless Willie Nelson On Positive Thinking

Jiaxing Train Station By Architect Ma Yansong Is A Model Of People-Centric, Green Urban Design

Midwestern Grotto Tradition Celebrated In Sheboygan, WI

Hugh Jackman And Sonia Friedman Boldly Bid To Democratize Theater

C3 AI Stock Is Soaring Today: Here’s Why – C3.ai (NYSE:AI)

Nvidia To Be Hit By China Chip Export Curbs Or Deliver Q2 Guidance Surprise After Middle East Deal? Here’s What Charts Show Ahead Of Q1 Results – NVIDIA (NASDAQ:NVDA), Oracle (NYSE:ORCL)

Paper page – Language-Image Alignment with Fixed Text Encoders

What's Hot

Browsing: Hugging Face

Subscribe to Updates