Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 85% Match Research Paper Materials Scientists,Computational Chemists,AI Researchers in Scientific Domains 3 weeks ago

Element2Vec: Build Chemical Element Representation from Text for Property Prediction

large-language-models › model-architecture
📄 Abstract

Abstract: Accurate property data for chemical elements is crucial for materials design and manufacturing, but many of them are difficult to measure directly due to equipment constraints. While traditional methods use the properties of other elements or related properties for prediction via numerical analyses, they often fail to model complex relationships. After all, not all characteristics can be represented as scalars. Recent efforts have been made to explore advanced AI tools such as language models for property estimation, but they still suffer from hallucinations and a lack of interpretability. In this paper, we investigate Element2Vecto effectively represent chemical elements from natural languages to support research in the natural sciences. Given the text parsed from Wikipedia pages, we use language models to generate both a single general-purpose embedding (Global) and a set of attribute-highlighted vectors (Local). Despite the complicated relationship across elements, the computational challenges also exist because of 1) the discrepancy in text distribution between common descriptions and specialized scientific texts, and 2) the extremely limited data, i.e., with only 118 known elements, data for specific properties is often highly sparse and incomplete. Thus, we also design a test-time training method based on self-attention to mitigate the prediction error caused by Vanilla regression clearly. We hope this work could pave the way for advancing AI-driven discovery in materials science.
Authors (6)
Yuanhao Li
Keyuan Lai
Tianqi Wang
Qihao Liu
Jiawei Ma
Yuan-Chao Hu
Submitted
October 15, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

This paper introduces Element2Vec, a novel method for generating effective representations of chemical elements from natural language text. It addresses the limitations of traditional numerical methods and current AI approaches by providing both general-purpose and attribute-highlighted embeddings, which are crucial for accurate property prediction in materials design and manufacturing.

Business Value

Enables faster and more accurate discovery of new materials and chemical compounds by providing better representations for AI-driven design and analysis, potentially reducing experimental costs and time-to-market for new products.