Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Feature foundation models - usually vision transformers - offer rich semantic
descriptors of images, useful for downstream tasks such as (interactive)
segmentation and object detection. For computational efficiency these
descriptors are often patch-based, and so struggle to represent the fine
features often present in micrographs; they also struggle with the large image
sizes present in materials and biological image analysis. In this work, we
train a convolutional neural network to upsample low-resolution (i.e, large
patch size) foundation model features with reference to the input image. We
apply this upsampler network (without any further training) to efficiently
featurise and then segment a variety of microscopy images, including plant
cells, a lithium-ion battery cathode and organic crystals. The richness of
these upsampled features admits separation of hard to segment phases, like
hairline cracks. We demonstrate that interactive segmentation with these deep
features produces high-quality segmentations far faster and with far fewer
labels than training or finetuning a more traditional convolutional network.