Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We present PromptGAR, a novel framework for Group Activity Recognition (GAR)
that offering both input flexibility and high recognition accuracy. The
existing approaches suffer from limited real-world applicability due to their
reliance on full prompt annotations, fixed number of frames and instances, and
the lack of actor consistency. To bridge the gap, we proposed PromptGAR, which
is the first GAR model to provide input flexibility across prompts, frames, and
instances without the need for retraining. We leverage diverse visual prompts,
like bounding boxes, skeletal keypoints, and instance identities, by unifying
them as point prompts. A recognition decoder then cross-updates class and
prompt tokens for enhanced performance. To ensure actor consistency for
extended activity durations, we also introduce a relative instance attention
mechanism that directly encodes instance identities. Comprehensive evaluations
demonstrate that PromptGAR achieves competitive performances both on full
prompts and partial prompt inputs, establishing its effectiveness on input
flexibility and generalization ability for real-world applications.