arXiv AI

Unified Foundation Model for Colonoscopy Video Analysis

By Advanced AI EditorApril 5, 2025No Comments2 Mins Read

[Submitted on 31 Mar 2025 (v1), last revised 2 Apr 2025 (this version, v2)]

View a PDF of the paper titled PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis, by Anwesa Choudhuri and 4 other authors

View PDF
HTML (experimental)

Abstract:Early detection, accurate segmentation, classification and tracking of polyps during colonoscopy are critical for preventing colorectal cancer. Many existing deep-learning-based methods for analyzing colonoscopic videos either require task-specific fine-tuning, lack tracking capabilities, or rely on domain-specific pre-training. In this paper, we introduce PolypSegTrack, a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos. Our approach leverages a novel conditional mask loss, enabling flexible training across datasets with either pixel-level segmentation masks or bounding box annotations, allowing us to bypass task-specific fine-tuning. Our unsupervised tracking module reliably associates polyp instances across frames using object queries, without relying on any heuristics. We leverage a robust vision foundation model backbone that is pre-trained unsupervisedly on natural images, thereby removing the need for domain-specific pre-training. Extensive experiments on multiple polyp benchmarks demonstrate that our method significantly outperforms existing state-of-the-art approaches in detection, segmentation, classification, and tracking.

Submission history

From: Anwesa Choudhuri [view email]
[v1]
Mon, 31 Mar 2025 14:00:21 UTC (1,422 KB)
[v2]
Wed, 2 Apr 2025 19:58:56 UTC (1,422 KB)

Previous ArticleMicrosoft AI CEO’s remarks interrupted by pro-Palestinian protester

Next Article AI Assistant Refuses To Write Code, Tells User To “Develop Logic”

Advanced AI Editor

Leave A Reply