arxiv_ai 94% Match Research Paper GIS analysts,Data scientists,Urban planners,Environmental scientists,AI researchers 1 week ago

From Questions to Queries: An AI-powered Multi-Agent Framework for Spatial Text-to-SQL

large-language-models › reasoning

📄 Abstract

Abstract: The complexity of Structured Query Language (SQL) and the specialized nature of geospatial functions in tools like PostGIS present significant barriers to non-experts seeking to analyze spatial data. While Large Language Models (LLMs) offer promise for translating natural language into SQL (Text-to-SQL), single-agent approaches often struggle with the semantic and syntactic complexities of spatial queries. To address this, we propose a multi-agent framework designed to accurately translate natural language questions into spatial SQL queries. The framework integrates several innovative components, including a knowledge base with programmatic schema profiling and semantic enrichment, embeddings for context retrieval, and a collaborative multi-agent pipeline as its core. This pipeline comprises specialized agents for entity extraction, metadata retrieval, query logic formulation, SQL generation, and a review agent that performs programmatic and semantic validation of the generated SQL to ensure correctness (self-verification). We evaluate our system using both the non-spatial KaggleDBQA benchmark and a new, comprehensive SpatialQueryQA benchmark that includes diverse geometry types, predicates, and three levels of query complexity. On KaggleDBQA, the system achieved an overall accuracy of 81.2% (221 out of 272 questions) after the review agent's review and corrections. For spatial queries, the system achieved an overall accuracy of 87.7% (79 out of 90 questions), compared with 76.7% without the review agent. Beyond accuracy, results also show that in some instances the system generates queries that are more semantically aligned with user intent than those in the benchmarks. This work makes spatial analysis more accessible, and provides a robust, generalizable foundation for spatial Text-to-SQL systems, advancing the development of autonomous GIS.

Authors (4)

Ali Khosravi Kazazi

Zhenlong Li

M. Naser Lessani

Guido Cervone

Submitted

October 23, 2025

arXiv Category

cs.AI

arXiv PDF

Key Contributions

Proposes a multi-agent framework for accurate spatial Text-to-SQL translation, addressing LLM limitations with complex spatial queries. It integrates a knowledge base, semantic enrichment, and specialized agents for entity extraction, metadata retrieval, query logic, SQL generation, and validation, improving accuracy for non-experts.

Business Value

Democratizes access to geospatial data analysis by allowing users to query complex spatial databases using natural language, accelerating insights for planning, research, and operations.

Paper Metadata

Innovation Type

Framework/Pipeline

Deployment Feasibility

Moderate to High, requires integration with existing spatial databases and LLM infrastructure.

Limitations Addressed

Complexity of SQL and geospatial functions (PostGIS) for non-experts, limitations of single-agent LLMs in handling semantic and syntactic complexities of spatial queries.

Performance Gains

Improved accuracy and robustness in translating natural language questions into complex spatial SQL queries compared to single-agent LLM approaches.

Technical Tags

Text-to-SQLspatial SQLPostGISmulti-agent frameworkgeospatial functionsnatural language queriesknowledge baseschema profilingsemantic enrichmentembeddingsquery validation

Research Topics

Natural Language ProcessingDatabasesGeospatial Data AnalysisLarge Language ModelsMulti-agent Systems

Methods & Architectures

Multi-agent frameworkKnowledge base with programmatic schema profilingSemantic enrichmentEmbeddings for context retrievalSpecialized agents (entity extraction, metadata retrieval, query logic, SQL generation, review)Programmatic and semantic validation Large Language Model (LLM)Multi-agent system

Applications & Tasks

Geospatial data analysis Database querying Urban planning Environmental science GIS Translating natural language questions into spatial SQLSimplifying complex spatial data queryingImproving accuracy of Text-to-SQL for geospatial data Spatial Text-to-SQL generationGeospatial data explorationAutomated report generation from spatial data

Related Fields

Geographic Information Systems (GIS)Database ManagementNatural Language UnderstandingArtificial IntelligenceSpatial Analysis

Keywords

Text-to-SQLspatial SQLgeospatialPostGISLLMmulti-agentnatural language querydatabaseGISquery generationschema profiling

Academic Context

#Natural Language Processing#Databases#Geospatial Data Analysis#Large Language Models#Multi-agent Systems

Technology Stack

Frameworks & Libraries

Multi-agent frameworkPostGIS

Commercial Potential

Potential Products

Natural language interface for GIS databasesAutomated spatial data analysis toolsQuery generation services for spatial data

Target Industries

Urban PlanningEnvironmental ManagementReal EstateLogisticsGovernment/Public Sector

Use Case Examples

Asking 'Show me all parks within 1 mile of the city center' and getting the correct SQL queryAnalyzing environmental impact reports using natural languageQuerying property databases for specific geographic criteria

Competitive Edge

Addresses the specific challenges of spatial Text-to-SQL, offering a more specialized and accurate solution than general-purpose Text-to-SQL models.

Market Opportunity

Large market for GIS software and spatial data analytics.

Revenue Models

Software licensingAPI accessconsulting services.

Resource Requirements

Compute Needs

Moderate to High, depending on the LLM and the complexity of the spatial database.

Data Requirements

Access to spatial databases (e.g., PostGIS) and their schemas.

Deployment Constraints

Requires integration with existing database systems; performance depends on the quality of schema information and LLM capabilities.

Scalability

Scalable to different spatial databases and query complexities.

Production Readiness

Maturity Level

Research

Time to Market

1-3 years

Patent Potential

Moderate, for the multi-agent framework and specific agent designs.

View Full Paper Back to Papers