Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 80% Match Research Paper Machine Learning Theorists,Researchers in Generalization,AI Safety Researchers 1 week ago

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

ai-safety › robustness
📄 Abstract

Abstract: We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.
Authors (3)
Cynthia Dwork
Lunjia Hu
Han Shao
Submitted
June 20, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Introduces the 'domain shattering dimension' as a new combinatorial measure to characterize domain generalization. It shows this dimension determines domain sample complexity and establishes a tight relationship with the VC dimension, proving that learnability in the standard PAC setting implies learnability in the domain generalization setting.

Business Value

Provides theoretical foundations for building more robust AI systems that can generalize well to unseen data distributions, reducing the need for extensive retraining or domain adaptation.