Browsing: Hugging Face
Large Language Models (LLMs) exhibit remarkable capabilities but are susceptible to adversarial prompts that exploit vulnerabilities to produce unsafe or…
We introduce TAPIP3D, a novel approach for long-term 3D point tracking in monocular RGB and RGB-D videos. TAPIP3D represents videos…
Scene-level 3D generation represents a critical frontier in multimedia and computer graphics, yet existing approaches either suffer from limited object…
Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common…
Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations, where they generate descriptions of nonexistent objects,…
We introduce Perception Encoder (PE), a state-of-the-art encoder for image and video understanding trained via simple vision-language learning. Traditionally, vision…
Scaling test-time compute has emerged as a key ingredient for enabling large language models (LLMs) to solve difficult problems, but…
Current learning-based subject customization approaches, predominantly relying on U-Net architectures, suffer from limited generalization ability and compromised image quality. Meanwhile,…
Computational color constancy, or white balancing, is a key module in a camera’s image signal processor (ISP) that corrects color…
Vision-language models are integral to computer vision research, yet many high-performing models remain closed-source, obscuring their data, design and training…