Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Foundation models (FMs) such as CLIP and SAM have recently shown great
promise in image segmentation tasks, yet their adaptation to 3D medical
imaging-particularly for pathology detection and segmentation-remains
underexplored. A critical challenge arises from the domain gap between natural
images and medical volumes: existing FMs, pre-trained on 2D data, struggle to
capture 3D anatomical context, limiting their utility in clinical applications
like tumor segmentation. To address this, we propose an adaptation framework
called TAGS: Tumor Adaptive Guidance for SAM, which unlocks 2D FMs for 3D
medical tasks through multi-prompt fusion. By preserving most of the
pre-trained weights, our approach enhances SAM's spatial feature extraction
using CLIP's semantic insights and anatomy-specific prompts. Extensive
experiments on three open-source tumor segmentation datasets prove that our
model surpasses the state-of-the-art medical image segmentation models (+46.88%
over nnUNet), interactive segmentation frameworks, and other established
medical FMs, including SAM-Med2D, SAM-Med3D, SegVol, Universal, 3D-Adapter, and
SAM-B (at least +13% over them). This highlights the robustness and
adaptability of our proposed framework across diverse medical segmentation
tasks.