Browsing: Hugging Face
We present visual action prompts, a unified action representation for action-to-video generation of complex high-DoF interactions while maintaining transferable visual…
Video relighting is a challenging yet valuable task, aiming to replace the background in videos while correspondingly adjusting the lighting…
Paper search is an important activity for researchers, typically involving using a query with description of a topic to find…
We introduce StyleMM, a novel framework that can construct a stylized 3D Morphable Model (3DMM) based on user-defined text descriptions…
As Multimodal Large Language Models (MLLMs) gain widespread applicability, it is becoming increasingly desirable to adapt them for diverse user…
Deep learning has revolutionized medical imaging, but its effectiveness is severely limited by insufficient labeled training data. This paper introduces…
Multimodal Large Language Models (MLLMs) have demonstrated impressive capabilities across various tasks, but still struggle with complex mathematical reasoning. Existing…
A benchmark called PRELUDE evaluates long-context understanding by assessing the consistency of prequel stories with original books, revealing significant challenges…
While Multimodal Large Language Models (MLLMs) show immense promise for achieving truly human-like interactions, progress is hindered by the lack…
Reinforcement learning with verifiable rewards (RLVR), which typically adopts Pass@1 as the reward, has faced the issues in balancing exploration…