Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Medical image segmentation is vital for clinical diagnosis, yet current deep
learning methods often demand extensive expert effort, i.e., either through
annotating large training datasets or providing prompts at inference time for
each new case. This paper introduces a zero-shot and automatic segmentation
pipeline that combines off-the-shelf vision-language and segmentation
foundation models. Given a medical image and a task definition (e.g., "segment
the optic disc in an eye fundus image"), our method uses a grounding model to
generate an initial bounding box, followed by a visual prompt boosting module
that enhance the prompts, which are then processed by a promptable segmentation
model to produce the final mask. To address the challenges of domain gap and
result verification, we introduce a test-time adaptation framework featuring a
set of learnable adaptors that align the medical inputs with foundation model
representations. Its hyperparameters are optimized via Bayesian Optimization,
guided by a proxy validation model without requiring ground-truth labels. Our
pipeline offers an annotation-efficient and scalable solution for zero-shot
medical image segmentation across diverse tasks. Our pipeline is evaluated on
seven diverse medical imaging datasets and shows promising results. By proper
decomposition and test-time adaptation, our fully automatic pipeline not only
substantially surpasses the previously best-performing method, yielding a 69\%
relative improvement in accuracy (Dice Score from 42.53 to 71.81), but also
performs competitively with weakly-prompted interactive foundation models.
Key Contributions
Introduces AutoMiSeg, an automatic zero-shot medical image segmentation pipeline using foundation models. It combines a grounding model for bounding boxes, a prompt boosting module, and a promptable segmentation model, enhanced by test-time adaptation with learnable adaptors to bridge the domain gap, eliminating the need for extensive expert annotation or per-case prompting.
Business Value
Significantly reduces the cost and time associated with medical image segmentation, accelerating clinical diagnosis and research by making advanced segmentation accessible without specialized expertise.