Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Single-source Domain Generalized Object Detection (SDGOD), as a cutting-edge
research topic in computer vision, aims to enhance model generalization
capability in unseen target domains through single-source domain training.
Current mainstream approaches attempt to mitigate domain discrepancies via data
augmentation techniques. However, due to domain shift and limited
domain-specific knowledge, models tend to fall into the pitfall of spurious
correlations. This manifests as the model's over-reliance on simplistic
classification features (e.g., color) rather than essential domain-invariant
representations like object contours. To address this critical challenge, we
propose the Cauvis (Causal Visual Prompts) method. First, we introduce a
Cross-Attention Prompts module that mitigates bias from spurious features by
integrating visual prompts with cross-attention. To address the inadequate
domain knowledge coverage and spurious feature entanglement in visual prompts
for single-domain generalization, we propose a dual-branch adapter that
disentangles causal-spurious features while achieving domain adaptation via
high-frequency feature extraction. Cauvis achieves state-of-the-art performance
with 15.9-31.4% gains over existing domain generalization methods on SDGOD
datasets, while exhibiting significant robustness advantages in complex
interference environments.
Authors (6)
Chen Li
Huiying Xu
Changxin Gao
Zeyu Wang
Yun Liu
Xinzhong Zhu
Submitted
October 22, 2025
Key Contributions
Proposes Cauvis (Causal Visual Prompts), a method for Single-Source Domain Generalized Object Detection (SDGOD) that uses causal visual prompts and cross-attention to mitigate bias from spurious features. It aims to learn domain-invariant representations by focusing on essential features like object contours rather than superficial ones like color.
Business Value
Enhances the robustness and reliability of object detection systems in real-world scenarios where environmental conditions or object appearances can vary significantly. This is critical for applications like autonomous driving, where reliable perception is paramount.