Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: As an important task in intelligent transportation systems, Aerial-Ground
person Re-IDentification (AG-ReID) aims to retrieve specific persons across
heterogeneous cameras in different viewpoints. Previous methods typically adopt
deep learning-based models, focusing on extracting view-invariant features.
However, they usually overlook the semantic information in person attributes.
In addition, existing training strategies often rely on full fine-tuning
large-scale models, which significantly increases training costs. To address
these issues, we propose a novel framework named LATex for AG-ReID, which
adopts prompt-tuning strategies to leverage attribute-based text knowledge.
More specifically, we first introduce the Contrastive Language-Image
Pre-training (CLIP) model as the backbone, and propose an Attribute-aware Image
Encoder (AIE) to extract both global semantic features and attribute-aware
features from input images. Then, with these features, we propose a Prompted
Attribute Classifier Group (PACG) to predict person attributes and obtain
attribute representations. Finally, we design a Coupled Prompt Template (CPT)
to transform attribute representations and view information into structured
sentences. These sentences are processed by the text encoder of CLIP to
generate more discriminative features. As a result, our framework can fully
leverage attribute-based text knowledge to improve AG-ReID performance.
Extensive experiments on three AG-ReID benchmarks demonstrate the effectiveness
of our proposed methods. The source code will be available.