📄 Abstract
Abstract: Scientific illustrations demand both high information density and
post-editability. However, current generative models have two major
limitations: Frist, image generation models output rasterized images lacking
semantic structure, making it impossible to access, edit, or rearrange
independent visual components in the images. Second, code-based generation
methods (TikZ or SVG), although providing element-level control, force users
into the cumbersome cycle of "writing-compiling-reviewing" and lack the
intuitiveness of manipulation. Neither of these two approaches can well meet
the needs for efficiency, intuitiveness, and iterative modification in
scientific creation. To bridge this gap, we introduce VisPainter, a multi-agent
framework for scientific illustration built upon the model context protocol.
VisPainter orchestrates three specialized modules-a Manager, a Designer, and a
Toolbox-to collaboratively produce diagrams compatible with standard vector
graphics software. This modular, role-based design allows each element to be
explicitly represented and manipulated, enabling true element-level control and
any element can be added and modified later. To systematically evaluate the
quality of scientific illustrations, we introduce VisBench, a benchmark with
seven-dimensional evaluation metrics. It assesses high-information-density
scientific illustrations from four aspects: content, layout, visual perception,
and interaction cost. To this end, we conducted extensive ablation experiments
to verify the rationality of our architecture and the reliability of our
evaluation methods. Finally, we evaluated various vision-language models,
presenting fair and credible model rankings along with detailed comparisons of
their respective capabilities. Additionally, we isolated and quantified the
impacts of role division, step control,and description on the quality of
illustrations.
Authors (9)
Jianwen Sun
Fanrui Zhang
Yukang Feng
Chuanhao Li
Zizhen Li
Jiaxin Ai
+3 more
Submitted
October 31, 2025
Key Contributions
VisPainter is a novel multi-agent framework that addresses the limitations of current scientific illustration methods by enabling editable vector graphics. It bridges the gap between rasterized images lacking semantic structure and cumbersome code-based generation, offering an intuitive and efficient approach for creating and modifying scientific diagrams.
Business Value
Streamlines the creation and editing of scientific illustrations, saving researchers and publishers significant time and effort. Enables more dynamic and interactive scientific communication.