Browsing: Hugging Face
This paper presents an effective approach for learning novel 4D embodied world models, which predict the dynamic evolution of 3D…
The exposure of large language models (LLMs) to copyrighted material during pre-training raises concerns about unintentional copyright infringement post deployment.…
Abstract: Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they…
Although contemporary text-to-image generation models have achieved remarkable breakthroughs in producing visually appealing images, their capacity to generate precise and…
Mathematical geometric problem solving (GPS) often requires effective integration of multimodal information and verifiable logical coherence. Despite the fast development…
Downsampling layers are crucial building blocks in CNN architectures, which help to increase the receptive field for learning high-level features…
The integration of long-context capabilities with visual understanding unlocks unprecedented potential for Vision Language Models (VLMs). However, the quadratic attention…
Global healthcare providers are exploring use of large language models (LLMs) to provide medical advice to the public. LLMs now…
We introduce CameraBench, a large-scale dataset and benchmark designed to assess and improve camera motion understanding. CameraBench consists of ~3,000…
Given a single labeled example, in-context segmentation aims to segment corresponding objects. This setting, known as one-shot segmentation in few-shot…