Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cv 85% Match Research Paper AI researchers in chemistry,Computational chemists,Data scientists in pharma/materials 20 hours ago

RxnCaption: Reformulating Reaction Diagram Parsing as Visual Prompt Guided Captioning

computer-vision › object-detection
📄 Abstract

Abstract: Large-scale chemical reaction datasets are crucial for AI research in chemistry. However, existing chemical reaction data often exist as images within papers, making them not machine-readable and unusable for training machine learning models. In response to this challenge, we propose the RxnCaption framework for the task of chemical Reaction Diagram Parsing (RxnDP). Our framework reformulates the traditional coordinate prediction driven parsing process into an image captioning problem, which Large Vision-Language Models (LVLMs) handle naturally. We introduce a strategy termed "BBox and Index as Visual Prompt" (BIVP), which uses our state-of-the-art molecular detector, MolYOLO, to pre-draw molecular bounding boxes and indices directly onto the input image. This turns the downstream parsing into a natural-language description problem. Extensive experiments show that the BIVP strategy significantly improves structural extraction quality while simplifying model design. We further construct the RxnCaption-11k dataset, an order of magnitude larger than prior real-world literature benchmarks, with a balanced test subset across four layout archetypes. Experiments demonstrate that RxnCaption-VL achieves state-of-the-art performance on multiple metrics. We believe our method, dataset, and models will advance structured information extraction from chemical literature and catalyze broader AI applications in chemistry. We will release data, models, and code on GitHub.

Key Contributions

Reformulates chemical reaction diagram parsing as a visual prompt guided captioning problem, enabling Large Vision-Language Models (LVLMs) to process chemical data. Introduces the BBox and Index as Visual Prompt (BIVP) strategy using a molecular detector (MolYOLO) to improve structural extraction quality and simplify model design.

Business Value

Automates the extraction of crucial chemical reaction data from scientific literature, accelerating research and development in areas like drug discovery and materials science by making vast amounts of existing knowledge machine-readable.