Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Large-scale chemical reaction datasets are crucial for AI research in
chemistry. However, existing chemical reaction data often exist as images
within papers, making them not machine-readable and unusable for training
machine learning models. In response to this challenge, we propose the
RxnCaption framework for the task of chemical Reaction Diagram Parsing (RxnDP).
Our framework reformulates the traditional coordinate prediction driven parsing
process into an image captioning problem, which Large Vision-Language Models
(LVLMs) handle naturally. We introduce a strategy termed "BBox and Index as
Visual Prompt" (BIVP), which uses our state-of-the-art molecular detector,
MolYOLO, to pre-draw molecular bounding boxes and indices directly onto the
input image. This turns the downstream parsing into a natural-language
description problem. Extensive experiments show that the BIVP strategy
significantly improves structural extraction quality while simplifying model
design. We further construct the RxnCaption-11k dataset, an order of magnitude
larger than prior real-world literature benchmarks, with a balanced test subset
across four layout archetypes. Experiments demonstrate that RxnCaption-VL
achieves state-of-the-art performance on multiple metrics. We believe our
method, dataset, and models will advance structured information extraction from
chemical literature and catalyze broader AI applications in chemistry. We will
release data, models, and code on GitHub.
Key Contributions
Reformulates chemical reaction diagram parsing as a visual prompt guided captioning problem, enabling Large Vision-Language Models (LVLMs) to process chemical data. Introduces the BBox and Index as Visual Prompt (BIVP) strategy using a molecular detector (MolYOLO) to improve structural extraction quality and simplify model design.
Business Value
Automates the extraction of crucial chemical reaction data from scientific literature, accelerating research and development in areas like drug discovery and materials science by making vast amounts of existing knowledge machine-readable.