Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Complex chart understanding tasks demand advanced visual recognition and
reasoning capabilities from multimodal large language models (MLLMs). However,
current research provides limited coverage of complex chart scenarios and
computation-intensive reasoning tasks prevalent in real-world applications.
This study proposes an automated multi-stage code-driven pipeline for
systematically generating visual reasoning datasets to address these
limitations. The pipeline integrates retrieval-augmented generation (RAG) to
retrieve professional chart templates and employs chain-of-thought (CoT)
strategies to generate reasoning codes that simulate real data distributions,
thereby driving chart rendering and question-related statistical computations.
Through model-based evaluation, the pipeline enhances chart diversity and data
quality. Using this framework, we construct ChartM$^3$, a multi-dimensional and
multi-step dataset containing 38K charts and 142K Q&A pairs for training, along
with 2,871 high-quality evaluation samples for enabling practical performance
assessment. Supervised fine-tuning (SFT) and reinforcement learning (RL)
experiments demonstrate that our dataset significantly improves reasoning
capabilities and cross-domain generalization performance, enabling smaller
models to achieve performance comparable to larger-scale models in complex
chart comprehension.
Key Contributions
Proposes ChartM$^3$, a code-driven pipeline for generating multi-dimensional and multi-step visual reasoning datasets for chart comprehension. It uses RAG for chart templates and CoT strategies for reasoning codes, enabling the creation of diverse and high-quality chart-based Q&A data.
Business Value
Facilitates the development of more powerful AI systems capable of understanding complex data visualizations, leading to better automated reporting, data analysis tools, and decision support systems.