Redirecting to original paper in 30 seconds...

Click below to go immediately or wait for automatic redirect

arxiv_cl 95% Match Research Paper Data Scientists,Business Analysts,AI Researchers,Software Developers 1 week ago

SheetBrain: A Neuro-Symbolic Agent for Accurate Reasoning over Complex and Large Spreadsheets

large-language-models › reasoning
📄 Abstract

Abstract: Understanding and reasoning over complex spreadsheets remain fundamental challenges for large language models (LLMs), which often struggle with accurately capturing the complex structure of tables and ensuring reasoning correctness. In this work, we propose SheetBrain, a neuro-symbolic dual workflow agent framework designed for accurate reasoning over tabular data, supporting both spreadsheet question answering and manipulation tasks. SheetBrain comprises three core modules: an understanding module, which produces a comprehensive overview of the spreadsheet - including sheet summary and query-based problem insight to guide reasoning; an execution module, which integrates a Python sandbox with preloaded table-processing libraries and an Excel helper toolkit for effective multi-turn reasoning; and a validation module, which verifies the correctness of reasoning and answers, triggering re-execution when necessary. We evaluate SheetBrain on multiple public tabular QA and manipulation benchmarks, and introduce SheetBench, a new benchmark targeting large, multi-table, and structurally complex spreadsheets. Experimental results show that SheetBrain significantly improves accuracy on both existing benchmarks and the more challenging scenarios presented in SheetBench. Our code is publicly available at https://github.com/microsoft/SheetBrain.
Authors (10)
Ziwei Wang
Jiayuan Su
Mengyu Zhou
Huaxing Zeng
Mengni Jia
Xiao Lv
+4 more
Submitted
October 22, 2025
arXiv Category
cs.CL
arXiv PDF

Key Contributions

SheetBrain is a novel neuro-symbolic dual workflow agent framework designed for accurate reasoning over complex spreadsheets. It integrates an understanding module, a Python sandbox execution module with table-processing libraries, and a validation module to ensure correctness, addressing LLM limitations in tabular data comprehension and reasoning.

Business Value

Automates complex data analysis and manipulation tasks involving spreadsheets, improving efficiency and accuracy for businesses relying on tabular data.