Browsing: Hugging Face
A residual learning approach enhances Sparse Autoencoders to capture domain-specific features without retraining, improving interpretability and performance on specialized domains.…
AutoSteer, a modular inference-time intervention technology, enhances the safety of Multimodal Large Language Models by reducing attack success rates across…
arXiv Link AnyCap Project is a unified captioning framework, dataset, and benchmark that supports image, audio, and video captioning with…
MindJourney enhances vision-language models with 3D reasoning by coupling them with a video diffusion-based world model, achieving improved performance on…
VisionThink dynamically adjusts image resolution and visual token processing for efficient and effective vision-language tasks, improving performance on OCR tasks…
Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation (TLB-VFI) improves video frame interpolation by efficiently extracting temporal information, reducing…
FLEXITOKENS, a byte-level language model with a learnable tokenizer, reduces token over-fragmentation and improves performance across multilingual and morphologically diverse…
Producing expressive facial animations from static images is a challenging task. Prior methods relying on explicit geometric priors (e.g., facial…
PhysX addresses the lack of physical properties in 3D generative models by introducing PhysXNet, a physics-annotated dataset, and PhysXGen, a…
DrafterBench is an open-source benchmark for evaluating LLM agents in technical drawing revision, assessing their capabilities in structured data comprehension,…