Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: We propose a graph semi-supervised learning framework for classification
tasks on data manifolds. Motivated by the manifold hypothesis, we model data as
points sampled from a low-dimensional manifold $\mathcal{M} \subset
\mathbb{R}^F$. The manifold is approximated in an unsupervised manner using a
variational autoencoder (VAE), where the trained encoder maps data to
embeddings that represent their coordinates in $\mathbb{R}^F$. A geometric
graph is constructed with Gaussian-weighted edges inversely proportional to
distances in the embedding space, transforming the point classification problem
into a semi-supervised node classification task on the graph. This task is
solved using a graph neural network (GNN). Our main contribution is a
theoretical analysis of the statistical generalization properties of this
data-to-manifold-to-graph pipeline. We show that, under uniform sampling from
$\mathcal{M}$, the generalization gap of the semi-supervised task diminishes
with increasing graph size, up to the GNN training error. Leveraging a training
procedure which resamples a slightly larger graph at regular intervals during
training, we then show that the generalization gap can be reduced even further,
vanishing asymptotically. Finally, we validate our findings with numerical
experiments on image classification benchmarks, demonstrating the empirical
effectiveness of our approach.
Authors (3)
Caio F. Deberaldini Netto
Zhiyang Wang
Luana Ruiz
Key Contributions
Proposes a graph semi-supervised learning framework that approximates data manifolds using VAEs and constructs graphs in the embedding space for node classification with GNNs. Provides a theoretical analysis of the statistical generalization properties of this data-to-manifold-to-graph pipeline, showing diminishing generalization gap under uniform sampling.
Business Value
Enhances the accuracy and robustness of classification tasks, especially when dealing with complex, high-dimensional data that lies on low-dimensional manifolds. Improves the ability to leverage large amounts of unlabeled data.