Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Accurate property data for chemical elements is crucial for materials design
and manufacturing, but many of them are difficult to measure directly due to
equipment constraints. While traditional methods use the properties of other
elements or related properties for prediction via numerical analyses, they
often fail to model complex relationships. After all, not all characteristics
can be represented as scalars. Recent efforts have been made to explore
advanced AI tools such as language models for property estimation, but they
still suffer from hallucinations and a lack of interpretability. In this paper,
we investigate Element2Vecto effectively represent chemical elements from
natural languages to support research in the natural sciences. Given the text
parsed from Wikipedia pages, we use language models to generate both a single
general-purpose embedding (Global) and a set of attribute-highlighted vectors
(Local). Despite the complicated relationship across elements, the
computational challenges also exist because of 1) the discrepancy in text
distribution between common descriptions and specialized scientific texts, and
2) the extremely limited data, i.e., with only 118 known elements, data for
specific properties is often highly sparse and incomplete. Thus, we also design
a test-time training method based on self-attention to mitigate the prediction
error caused by Vanilla regression clearly. We hope this work could pave the
way for advancing AI-driven discovery in materials science.
Authors (6)
Yuanhao Li
Keyuan Lai
Tianqi Wang
Qihao Liu
Jiawei Ma
Yuan-Chao Hu
Submitted
October 15, 2025
Key Contributions
This paper introduces Element2Vec, a novel method for generating effective representations of chemical elements from natural language text. It addresses the limitations of traditional numerical methods and current AI approaches by providing both general-purpose and attribute-highlighted embeddings, which are crucial for accurate property prediction in materials design and manufacturing.
Business Value
Enables faster and more accurate discovery of new materials and chemical compounds by providing better representations for AI-driven design and analysis, potentially reducing experimental costs and time-to-market for new products.