arxiv_ml 95% Match Research Paper Graph Machine Learning Researchers,Data Scientists,ML Engineers working with graph data 1 week ago

Graph Data Selection for Domain Adaptation: A Model-Free Approach

graph-neural-networks › graph-learning

📄 Abstract

Abstract: Graph domain adaptation (GDA) is a fundamental task in graph machine learning, with techniques like shift-robust graph neural networks (GNNs) and specialized training procedures to tackle the distribution shift problem. Although these model-centric approaches show promising results, they often struggle with severe shifts and constrained computational resources. To address these challenges, we propose a novel model-free framework, GRADATE (GRAph DATa sElector), that selects the best training data from the source domain for the classification task on the target domain. GRADATE picks training samples without relying on any GNN model's predictions or training recipes, leveraging optimal transport theory to capture and adapt to distribution changes. GRADATE is data-efficient, scalable and meanwhile complements existing model-centric GDA approaches. Through comprehensive empirical studies on several real-world graph-level datasets and multiple covariate shift types, we demonstrate that GRADATE outperforms existing selection methods and enhances off-the-shelf GDA methods with much fewer training data.

Authors (3)

Ting-Wei Li

Ruizhong Qiu

Hanghang Tong

Submitted

May 22, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

This paper proposes GRADATE, a novel model-free framework for graph domain adaptation (GDA) that selects optimal training data from the source domain for a target domain task. Unlike model-centric approaches, GRADATE uses optimal transport theory to adapt to distribution changes without relying on GNN predictions or training recipes. It is data-efficient, scalable, and complements existing GDA methods.

Business Value

Enables the effective application of GNNs to new domains where labeled data is scarce or differs significantly from existing datasets. This accelerates the adoption of GNNs in various industries by reducing the need for extensive retraining or data collection.

Paper Metadata

Innovation Type

Algorithmic

Deployment Feasibility

High. The model-free nature and focus on data selection make it adaptable to various GNN architectures and tasks. Its scalability is a key advantage for real-world applications.

Limitations Addressed

Struggles of model-centric GDA with severe shifts,Constrained computational resources in GDA,Need for model-free approaches to domain adaptation

Performance Gains

improved performance in graph domain adaptation,data-efficient and scalable

Technical Tags

graph domain adaptation (GDA)model-freedata selectionoptimal transportsource domaintarget domaindistribution shiftgraph neural networks (GNNs)data-efficientscalable

Research Topics

Graph Machine LearningDomain AdaptationTransfer LearningData SelectionDistribution Shift

Methods & Architectures

GRADATE (GRAph DATa SElector)Optimal Transport TheoryModel-free data selection Graph Neural Networks (GNNs)

Applications & Tasks

Social Networks Recommendation Systems Drug Discovery Cybersecurity Adapting graph models to new domainsHandling distribution shift in graph dataSelecting optimal training data Graph Domain AdaptationGraph ClassificationNode Classification

Datasets & Benchmarks

Datasets

real-world graph-level datasets

performance on graph-level datasetsrobustness to covariate shift types

Related Fields

Machine LearningTransfer LearningOptimizationData Mining

Keywords

graph domain adaptationGDAmodel-freedata selectionoptimal transportdistribution shiftgraph neural networksGNNssource domaintarget domaindata efficiencyscalability

Academic Context

#Graph Machine Learning#Domain Adaptation#Transfer Learning#Data Selection#Distribution Shift

Commercial Potential

Potential Products

GNN model adaptation toolsData selection platforms for graph learningDomain adaptation services for GNNs

Target Industries

Social MediaE-commerceBiotechnologyCybersecurityTelecommunications

Use Case Examples

Adapting a GNN trained on one social network to analyze anotherUsing GNNs for drug discovery by adapting models across different molecular datasetsDetecting fraudulent transactions in a new financial system using existing GNN models

Competitive Edge

Provides a novel model-free approach to GDA, complementing existing model-centric methods and offering advantages in data efficiency and scalability.

Market Opportunity

Growing market for GNN applications and domain adaptation solutions.

Revenue Models

Licensing of the GRADATE frameworkconsulting services.

Resource Requirements

Compute Needs

Moderate (for data selection process)

Data Requirements

Source and target domain graph datasets

Deployment Constraints

Requires access to both source and target domain data for selection.

Scalability

Designed to be scalable, particularly due to its model-free nature.

Production Readiness

Maturity Level

Research

Time to Market

1-2 years

Patent Potential

Moderate

View Full Paper Back to Papers