Paper page - AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

arXiv Link

AnyCap Project is a unified captioning framework, dataset, and benchmark that supports image, audio, and video captioning with controllable styles. It’s fully open-sourced, covering training, evaluation, and benchmarking!

✨ Highlights

🏆 Unified Multi-modal Captioning

A single framework for:

Image Captioning
Audio Captioning
Video Captioning

All under one roof—with support for modality-specific components.

📝 Customizable Captioning

Control the content and style of captions via single user text prompts:

Content: Background, Event, Instance, Action, Instance Appearance, Region and so on
Style: Brief, Detail, Genre, Length, Theme

Supports captions tailored for user needs.

📊 Open Benchmark & Evaluation: AnyCapEval

An industry-level benchmark with:

Modality-specific test sets (image/audio/video)
Content-related metrics
Style-related metrics

Gives rise to improved accuracy and reduced variance in assessment.

🛠️ End-to-End Open Source

Everything you need is included:

✅ Full training data
✅ Model inference pipeline
✅ Evaluation benchmark

All available under a permissive open-source license.

🔗 Get Started

Check out the paper and code:

📄 Paper: arXiv:2507.12841
📦 Code & Models: Github

📬 Contact

For questions, collaborations, or benchmark submissions, please reach out via the paper’s contact email.

Source link

What's Hot

ServiceNow to pay $2.85B for Moveworks’ AI tools

AI giants ‘fundamentally unprepared’ for dangers of human level intelligence

Deploy a full stack voice AI agent with Amazon Nova Sonic

Paper page – AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper page – MindJourney: Test-Time Scaling with World Models for Spatial Reasoning

Paper page – VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper page – TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation

Painter Says DHS Stole His Work for Post About ‘Homeland’s Heritage’

Planned Art Park in Tasmania May Be Scrapped—and More Art News

The Art Show 2025 Canceled by ADAA in “Strategic Pause”

Yale Art Gallery Rejects Federal Grants for Africa Migration Show