Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Graph neural networks have become the de facto model for learning from
structured data. However, the decision-making process of GNNs remains opaque to
the end user, which undermines their use in safety-critical applications.
Several explainable AI techniques for graphs have been developed to address
this major issue. Focusing on graph classification, these explainers identify
subgraph motifs that explain predictions. Therefore, a robust benchmarking of
graph explainers is required to ensure that the produced explanations are of
high quality, i.e., aligned with the GNN's decision process. However, current
graph-XAI benchmarks are limited to simplistic synthetic datasets or a few
real-world tasks curated by domain experts, hindering rigorous and reproducible
evaluation, and consequently stalling progress in the field. To overcome these
limitations, we propose a method to automate the construction of graph XAI
benchmarks from generic graph classification datasets. Our approach leverages
the Weisfeiler-Leman color refinement algorithm to efficiently perform
approximate subgraph matching and mine class-discriminating motifs, which serve
as proxy ground-truth class explanations. At the same time, we ensure that
these motifs can be learned by GNNs because their discriminating power aligns
with WL expressiveness. This work also introduces the OpenGraphXAI benchmark
suite, which consists of 15 ready-made graph-XAI datasets derived by applying
our method to real-world molecular classification datasets. The suite is
available to the public along with a codebase to generate over 2,000 additional
graph-XAI benchmarks. Finally, we present a use case that illustrates how the
suite can be used to assess the effectiveness of a selection of popular graph
explainers, demonstrating the critical role of a sufficiently large benchmark
collection for improving the significance of experimental results.
Authors (4)
Michele Fontanesi
Alessio Micheli
Marco Podda
Domenico Tortorella
Key Contributions
This paper proposes a method to automate the construction of graph XAI benchmarks from generic graph classification datasets using Weisfeiler-Leman coloring. This addresses the limitations of current benchmarks (simplistic or expert-curated) by enabling systematic, reproducible, and rigorous evaluation of graph explainability techniques.
Business Value
Increases trust and adoption of GNNs in safety-critical applications by providing tools to understand and verify their decision-making processes.