Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Generative models have become significant assets in the exploration and
identification of new materials, enabling the rapid proposal of candidate
crystal structures that satisfy target properties. Despite the increasing
adoption of diverse architectures, a rigorous comparative evaluation of their
performance on materials datasets is lacking. In this work, we present a
systematic benchmark of three representative generative models- AtomGPT (a
transformer-based model), Crystal Diffusion Variational Autoencoder (CDVAE),
and FlowMM (a Riemannian flow matching model). These models were trained to
reconstruct crystal structures from subsets of two publicly available
superconductivity datasets- JARVIS Supercon 3D and DS A/B from the Alexandria
database. Performance was assessed using the Kullback-Leibler (KL) divergence
between predicted and reference distributions of lattice parameters, as well as
the mean absolute error (MAE) of individual lattice constants. For the computed
KLD and MAE scores, CDVAE performs most favorably, followed by AtomGPT, and
then FlowMM. All benchmarking code and model configurations will be made
publicly available at https://github.com/atomgptlab/atombench_inverse.
Authors (3)
Charles Rhys Campbell
Aldo H. Romero
Kamal Choudhary
Submitted
October 17, 2025
Key Contributions
This paper presents AtomBench, a systematic benchmark for evaluating generative atomic structure models, including GPT, Diffusion, and Flow architectures. It addresses the lack of rigorous comparative evaluation by assessing performance on superconductivity datasets using KL divergence and MAE, providing insights into the strengths and weaknesses of different generative approaches for materials discovery.
Business Value
Accelerates the discovery of new materials with desired properties, potentially leading to breakthroughs in areas like superconductivity, energy storage, and catalysis, by providing a standardized way to evaluate and select the best generative models.