Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 95% Match Research Paper Computer Vision Researchers,AI/ML Engineers,Robotics Engineers,Autonomous Systems Developers 2 weeks ago

Towards Single-Source Domain Generalized Object Detection via Causal Visual Prompts

computer-vision › object-detection
📄 Abstract

Abstract: Single-source Domain Generalized Object Detection (SDGOD), as a cutting-edge research topic in computer vision, aims to enhance model generalization capability in unseen target domains through single-source domain training. Current mainstream approaches attempt to mitigate domain discrepancies via data augmentation techniques. However, due to domain shift and limited domain-specific knowledge, models tend to fall into the pitfall of spurious correlations. This manifests as the model's over-reliance on simplistic classification features (e.g., color) rather than essential domain-invariant representations like object contours. To address this critical challenge, we propose the Cauvis (Causal Visual Prompts) method. First, we introduce a Cross-Attention Prompts module that mitigates bias from spurious features by integrating visual prompts with cross-attention. To address the inadequate domain knowledge coverage and spurious feature entanglement in visual prompts for single-domain generalization, we propose a dual-branch adapter that disentangles causal-spurious features while achieving domain adaptation via high-frequency feature extraction. Cauvis achieves state-of-the-art performance with 15.9-31.4% gains over existing domain generalization methods on SDGOD datasets, while exhibiting significant robustness advantages in complex interference environments.
Authors (6)
Chen Li
Huiying Xu
Changxin Gao
Zeyu Wang
Yun Liu
Xinzhong Zhu
Submitted
October 22, 2025
arXiv Category
cs.CV
arXiv PDF

Key Contributions

Proposes Cauvis (Causal Visual Prompts), a method for Single-Source Domain Generalized Object Detection (SDGOD) that uses causal visual prompts and cross-attention to mitigate bias from spurious features. It aims to learn domain-invariant representations by focusing on essential features like object contours rather than superficial ones like color.

Business Value

Enhances the robustness and reliability of object detection systems in real-world scenarios where environmental conditions or object appearances can vary significantly. This is critical for applications like autonomous driving, where reliable perception is paramount.