Redirecting to original paper in 30 seconds...
Click below to go immediately or wait for automatic redirect
📄 Abstract
Abstract: Recent advances in Text-to-SQL have achieved strong results in static,
single-turn tasks, where models generate SQL queries from natural language
questions. However, these systems fall short in real-world interactive
scenarios, where user intents evolve and queries must be refined over multiple
turns. In applications such as finance and business analytics, users
iteratively adjust query constraints or dimensions based on intermediate
results. To evaluate such dynamic capabilities, we introduce DySQL-Bench, a
benchmark assessing model performance under evolving user interactions. Unlike
previous manually curated datasets, DySQL-Bench is built through an automated
two-stage pipeline of task synthesis and verification. Structured tree
representations derived from raw database tables guide LLM-based task
generation, followed by interaction-oriented filtering and expert validation.
Human evaluation confirms 100% correctness of the synthesized data. We further
propose a multi-turn evaluation framework simulating realistic interactions
among an LLM-simulated user, the model under test, and an executable database.
The model must adapt its reasoning and SQL generation as user intents change.
DySQL-Bench covers 13 domains across BIRD and Spider 2 databases, totaling
1,072 tasks. Even GPT-4o attains only 58.34% overall accuracy and 23.81% on the
Pass@5 metric, underscoring the benchmark's difficulty. All code and data are
released at https://github.com/Aurora-slz/Real-World-SQL-Bench .
Authors (9)
Linzhuang Sun
Tianyu Guo
Hao Liang
Yuying Li
Qifeng Cai
Jingxuan Wei
+3 more
Submitted
October 30, 2025
Key Contributions
This paper introduces DySQL-Bench, a novel benchmark for evaluating dynamic, multi-turn Text-to-SQL capabilities, addressing the limitations of static, single-turn systems in real-world interactive scenarios. The benchmark is built using an automated pipeline for task synthesis and verification, ensuring correctness and relevance for assessing evolving user intents in database exploration.
Business Value
Enables more intuitive and efficient data exploration for business users by allowing them to interact with databases using natural language, iteratively refining their queries based on intermediate results. This can lead to faster insights and better decision-making in finance and business analytics.