arxiv_ml 80% Match Research Paper Machine Learning Theorists,Researchers in Generalization,AI Safety Researchers 1 week ago

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

ai-safety › robustness

📄 Abstract

Abstract: We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.

Authors (3)

Cynthia Dwork

Lunjia Hu

Han Shao

Submitted

June 20, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces the 'domain shattering dimension' as a new combinatorial measure to characterize domain generalization. It shows this dimension determines domain sample complexity and establishes a tight relationship with the VC dimension, proving that learnability in the standard PAC setting implies learnability in the domain generalization setting.

Business Value

Provides theoretical foundations for building more robust AI systems that can generalize well to unseen data distributions, reducing the need for extensive retraining or domain adaptation.

Paper Metadata

Innovation Type

Theoretical Measure

Deployment Feasibility

Low. This is a theoretical contribution to ML theory, not a direct implementation.

Limitations Addressed

The fundamental question of how many domains are needed for effective domain generalization and providing a theoretical characterization of this sample complexity.

Performance Gains

Theoretical characterization of domain generalization sample complexity,Relationship between domain shattering dimension and VC dimension

Technical Tags

domain generalizationPAC learningdomain shattering dimensionVC dimensionsample complexityhypothesis classdata distributionscombinatorial measurelearnabilitydistribution shift

Research Topics

Domain GeneralizationMachine Learning TheoryGeneralization BoundsPAC LearningRobustness to Distribution Shift

Methods & Architectures

Domain Shattering DimensionPAC FrameworkCombinatorial AnalysisVC Dimension Analysis

Applications & Tasks

Machine Learning Theory Algorithm Design Characterizing Domain GeneralizationDetermining Sample ComplexityUnderstanding Learnability under Distribution Shift Learning models that generalize across domains

Related Fields

Machine Learning TheoryStatistical Learning TheoryRobustness in AIGeneralization in Deep Learning

Keywords

domain generalizationPAC learningdomain shattering dimensionVC dimensionsample complexitydistribution shiftlearnabilityhypothesis classmachine learning theoryrobustness

Academic Context

#Domain Generalization#Machine Learning Theory#Generalization Bounds#PAC Learning#Robustness to Distribution Shift

Commercial Potential

Use Case Examples

Developing theoretical guarantees for models deployed in diverse environmentsDesigning algorithms that are inherently more robust to data variations

Competitive Edge

Introduces a novel theoretical dimension that complements existing measures like VC dimension for understanding generalization.

Market Opportunity

N/A

Revenue Models

N/A

Resource Requirements

Compute Needs

None (theoretical study)

Data Requirements

None (theoretical study)

Deployment Constraints

N/A

Scalability

Focuses on theoretical bounds, not direct algorithmic scalability.

Production Readiness

Maturity Level

Theoretical

Time to Market

N/A

Patent Potential

Very Low

View Full Paper Back to Papers