Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Graph Transformers (GTs) have emerged as powerful architectures for
graph-structured data, yet remain constrained by rigid designs and lack
quantifiable interpretability. Current state-of-the-art GTs commit to fixed GNN
types across all layers, missing potential benefits of depth-specific component
selection, while their complex architectures become opaque where performance
gains cannot be distinguished between meaningful patterns and spurious
correlations. We redesign GT attention through asymmetry, decoupling structural
encoding from feature representation: queries derive from node features while
keys and values come from GNN transformations. Within this framework, we use
Differentiable ARchiTecture Search (DARTS) to select optimal GNN operators at
each layer, enabling depth-wise heterogeneity inside transformer attention
itself (DARTS-GT). To understand discovered architectures, we develop the first
quantitative interpretability framework for GTs through causal ablation. Our
metrics (Head-deviation, Specialization, and Focus), identify which heads and
nodes drive predictions while enabling model comparison. Experiments across
eight benchmarks show DARTS-GT achieves state-of-the-art on four datasets while
remaining competitive on others, with discovered architectures revealing
dataset-specific patterns. Our interpretability analysis reveals that visual
attention salience and causal importance do not always correlate, indicating
widely used visualization approaches may miss components that actually matter.
Crucially, heterogeneous architectures found by DARTS-GT consistently produced
more interpretable models than baselines, establishing that Graph Transformers
need not choose between performance and interpretability.
Authors (2)
Shruti Sarika Chakraborty
Peter Minary
Submitted
October 16, 2025
Key Contributions
Introduces DARTS-GT, a framework that uses Differentiable Architecture Search (DARTS) to optimize Graph Transformers (GTs) by selecting optimal GNN operators at each layer, enabling depth-wise heterogeneity. It also proposes a novel asymmetric attention mechanism and a causal ablation method for quantifiable, instance-specific interpretability.
Business Value
Enables the development of more powerful and understandable AI models for analyzing complex graph-structured data, leading to better insights in fields like drug discovery and materials science.