Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: The Segment Anything Model (SAM) excels at generating precise object masks
from input prompts but lacks semantic awareness, failing to associate its
generated masks with specific object categories. To address this limitation, we
propose U-SAM, a novel framework that imbibes semantic awareness into SAM,
enabling it to generate targeted masks for user-specified object categories.
Given only object class names as input from the user, U-SAM provides
pixel-level semantic annotations for images without requiring any
labeled/unlabeled samples from the test data distribution. Our approach
leverages synthetically generated or web crawled images to accumulate semantic
information about the desired object classes. We then learn a mapping function
between SAM's mask embeddings and object class labels, effectively enhancing
SAM with granularity-specific semantic recognition capabilities. As a result,
users can obtain meaningful and targeted segmentation masks for specific
objects they request, rather than generic and unlabeled masks. We evaluate
U-SAM on PASCAL VOC 2012 and MSCOCO-80, achieving significant mIoU improvements
of +17.95% and +5.20%, respectively, over state-of-the-art methods. By
transforming SAM into a semantically aware segmentation model, U-SAM offers a
practical and flexible solution for pixel-level annotation across diverse and
unseen domains in a resource-constrained environment.