Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_ml 90% Match Research Paper ML researchers,Data scientists,Computer vision engineers,Researchers in fields with manifold data (e.g., bioinformatics) 2 days ago

Graph Semi-Supervised Learning for Point Classification on Data Manifolds

graph-neural-networks › graph-learning
📄 Abstract

Abstract: We propose a graph semi-supervised learning framework for classification tasks on data manifolds. Motivated by the manifold hypothesis, we model data as points sampled from a low-dimensional manifold $\mathcal{M} \subset \mathbb{R}^F$. The manifold is approximated in an unsupervised manner using a variational autoencoder (VAE), where the trained encoder maps data to embeddings that represent their coordinates in $\mathbb{R}^F$. A geometric graph is constructed with Gaussian-weighted edges inversely proportional to distances in the embedding space, transforming the point classification problem into a semi-supervised node classification task on the graph. This task is solved using a graph neural network (GNN). Our main contribution is a theoretical analysis of the statistical generalization properties of this data-to-manifold-to-graph pipeline. We show that, under uniform sampling from $\mathcal{M}$, the generalization gap of the semi-supervised task diminishes with increasing graph size, up to the GNN training error. Leveraging a training procedure which resamples a slightly larger graph at regular intervals during training, we then show that the generalization gap can be reduced even further, vanishing asymptotically. Finally, we validate our findings with numerical experiments on image classification benchmarks, demonstrating the empirical effectiveness of our approach.
Authors (3)
Caio F. Deberaldini Netto
Zhiyang Wang
Luana Ruiz
Submitted
June 13, 2025
arXiv Category
cs.LG
arXiv PDF

Key Contributions

Proposes a graph semi-supervised learning framework that approximates data manifolds using VAEs and constructs graphs in the embedding space for node classification with GNNs. Provides a theoretical analysis of the statistical generalization properties of this data-to-manifold-to-graph pipeline, showing diminishing generalization gap under uniform sampling.

Business Value

Enhances the accuracy and robustness of classification tasks, especially when dealing with complex, high-dimensional data that lies on low-dimensional manifolds. Improves the ability to leverage large amounts of unlabeled data.