arxiv_ml 95% Match Research Paper LLM Researchers,NLP Engineers,AI System Developers 1 day ago

Scaling Graph Chain-of-Thought Reasoning: A Multi-Agent Framework with Efficient LLM Serving

large-language-models › reasoning

📄 Abstract

Abstract: Graph Chain-of-Thought (Graph-CoT) enables large language models (LLMs) to perform step-by-step reasoning over graph-structured knowledge, but existing pipelines suffer from low accuracy, excessive token usage, high latency, and low throughput due to single-agent monolithic prompts, repeated context re-encoding, and inefficient serving execution. We present GLM, the first multi-agent Graph-CoT system co-designed with an optimized LLM serving architecture. GLM decomposes reasoning into specialized agents for classification, reasoning, action generation, and graph retrieval, enabling branching and selective context sharing to reduce prompt length and reasoning iterations while preserving reasoning quality, thereby improving accuracy and reducing overall token consumption. To scale inference, we introduce a Graph-CoT-aware LLM inference mechanism with graph-specific KV-cache management, priority-based eviction, and pipelined execution to improve serving efficiency. Experiments demonstrate that GLM improves answer accuracy by up to 38%, reduces token cost by up to 95.7%, lowers inference latency by 90.3%, and achieves up to 15.1x higher throughput compared to state-of-the-art Graph-CoT baselines, enabling efficient adoption for complex real-world reasoning at scale.

Authors (14)

Chengying Huan

Ziheng Meng

Yongchao Liu

Zhengyi Yang

Yun Zhu

Yue Yun

+8 more

Submitted

November 3, 2025

arXiv Category

cs.LG

arXiv PDF

Key Contributions

Introduces GLM, the first multi-agent Graph-CoT system integrated with an optimized LLM serving architecture. It decomposes reasoning into specialized agents and employs graph-specific KV-cache management and pipelined execution to improve accuracy, reduce token usage, latency, and increase throughput.

Business Value

Enables more efficient and accurate reasoning over complex knowledge graphs using LLMs, which can power advanced Q&A systems, intelligent assistants, and knowledge discovery platforms.

Paper Metadata

Innovation Type

System Design and Optimization

Deployment Feasibility

Moderate to High, requires specialized serving infrastructure but addresses key performance bottlenecks.

Limitations Addressed

Low accuracy, excessive token usage, high latency, and low throughput of existing single-agent Graph-CoT pipelines.

Performance Gains

Improved accuracy,Reduced token consumption,Reduced latency,Increased throughput

Technical Tags

Graph Chain-of-ThoughtLLM servingmulti-agent systemsknowledge graphsreasoninginference optimizationKV-cache managementprompt engineering

Research Topics

Large Language ModelsReasoningKnowledge RepresentationEfficient AIMulti-Agent Systems

Methods & Architectures

Multi-agent frameworkLLM serving optimizationGraph-CoT-aware inferenceKV-cache managementPipelined execution Large Language Models (LLMs)Multi-agent system

Applications & Tasks

Natural Language Processing Knowledge Management AI Reasoning Reasoning over GraphsLLM Inference EfficiencyScalability Graph-based reasoningKnowledge graph queryingLLM serving

Related Fields

Artificial IntelligenceNatural Language ProcessingKnowledge RepresentationGraph Neural NetworksDistributed Systems

Keywords

Graph Chain-of-ThoughtLLMmulti-agentreasoningservinginferenceknowledge graphKV cachelatencythroughputprompt engineeringspecialized agents

Academic Context

#Large Language Models#Reasoning#Knowledge Representation#Efficient AI#Multi-Agent Systems

Technology Stack

ML Infrastructure

LLM serving infrastructure

Commercial Potential

Potential Products

Advanced Q&A systemsIntelligent assistantsKnowledge discovery tools

Target Industries

TechnologyInformation ServicesResearch

Use Case Examples

Answering complex questions requiring multi-hop reasoning over structured data.Automated knowledge base construction and querying.

Competitive Edge

Addresses limitations of existing monolithic Graph-CoT approaches by introducing a multi-agent framework and optimized serving architecture.

Market Opportunity

Growing demand for efficient LLM inference.

Resource Requirements

Compute Needs

High, for LLM inference and multi-agent coordination.

Data Requirements

Graph-structured data, knowledge graphs.

Deployment Constraints

Requires efficient LLM serving infrastructure.,Complexity of managing multiple agents.

Scalability

Designed for scaling LLM inference through optimized serving and multi-agent decomposition.

Production Readiness

Maturity Level

Research/Development

Time to Market

1-2 years

View Full Paper Back to Papers